CN109933774A

CN109933774A - Method for recognizing semantics, device storage medium and electronic device

Info

Publication number: CN109933774A
Application number: CN201711353756.4A
Authority: CN
Inventors: 杨柳; 何朝阳
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-12-15
Filing date: 2017-12-15
Publication date: 2019-06-25

Abstract

The invention discloses a kind of method for recognizing semantics, device storage medium and electronic devices.Wherein, this method comprises: obtaining the target text identified to target voice；In first database, the target word in the word of target text is searched, wherein first database is used to store the word with markup information, and markup information is used to indicate field belonging to the word with markup information；In the case where finding target word in first database,, with the target word of target markup information, the participle of target text will be determined as in first database, wherein, markup information includes target markup information, and target markup information is used to indicate field belonging to target word；Determine that the target of participle is semantic according to target markup information；It is semantic according to the target of participle, determine the semanteme of target text.The present invention solves there is technical issues that semantics recognition in the related technology.

Description

Method for recognizing semantics, device storage medium and electronic device

Technical field

The present invention relates to semantics recognition fields, in particular to a kind of method for recognizing semantics, device storage medium and electricity Sub-device.

Background technique

Currently, the maximum matching strategy that the data dictionary in full field is carried out is typically based on, with right in semantics recognition Sentence is segmented.For example, tradition based on dictionary natural language processing (Natural Language Processing, referred to as NLP) algorithm depends on large-scale word dictionary, produces ambiguity once lacking matching term data and will lead to participle.In addition, The data dictionary in the full field not exclusively includes the dictionary data needed in automotive field.

The above-mentioned dictionary based on full dose, it is big to will lead to EMS memory occupation；But the domanial words of light weight customized version cover not again Entirely；In addition, part of speech mark is marked compared with based on without the name entity of automotive field, and can not customized carry out part of speech Mark.

Fig. 1 is the schematic diagram according to one of the relevant technologies semantics recognition.As shown in Figure 1, session start to record language Abnormal accounting during sound is 8.8%, wherein including accounting E1:1% abnormal in vice activation, abnormal accounting is turned off manually E2:7.8；Record voice is 6.3% to being abnormal accounting during text by the speech recognition of record, wherein including not The accounting E3:6.3% of the text correctly identified；It is different during text to identification text semantic by the speech recognition of record Normal accounting E4:22.6%, wherein including the accounting E4:22.6% that incorrect identification is semantic；Identify that text semantic is executed to intention Abnormal accounting in the process is 6.4%, wherein the accounting E5:2.3% for executing identification executes the accounting E6:1.4 of time-out, executes The accounting E7:1.8% closed in the process, the accounting E8:0.9% for taking turns interaction error more.Thus, the failure rate of semantics recognition is 44.1%, total success rate of semantics recognition is 55.9%.

It can be seen from the above, coming out speech recognition in the failure rate of links, wherein identifying by voice background There is up to 22.6% failure rate in text semantic link, thus there are problems that semantics recognition low efficiency.

Fig. 2 is the schematic diagram identified according to one of the relevant technologies voice semantic platform.As shown in Fig. 2, voice is semantic There is the problem of instruction missing, intentional error, content missing, function renewal in land identification.

Aiming at the problem that above-mentioned semantics recognition low efficiency, currently no effective solution has been proposed.

Summary of the invention

The embodiment of the invention provides a kind of method for recognizing semantics, device storage medium and electronic devices, at least to solve There is technical issues that semantics recognition in the related technology.

According to an aspect of an embodiment of the present invention, a kind of method for recognizing semantics is provided.This method comprises: obtaining to mesh The target text that poster sound is identified；In first database, the target word in the word of target text is searched, In, first database is used to store the word with markup information, and markup information is used to indicate the word institute with markup information The field of category；In the case where finding target word in first database, will there is target markup information in first database Target word, be determined as the participle of target text, wherein markup information includes target markup information, and target markup information is used The field belonging to instruction target word；Determine that the target of participle is semantic according to target markup information；According to the target language of participle Justice determines the semanteme of target text.

According to another aspect of an embodiment of the present invention, a kind of semantic recognition device is additionally provided.The device includes: to obtain list Member, for obtaining the target text identified to target voice；Searching unit, for searching in first database Target word in the word of target text, wherein first database is used to store the word with markup information, markup information It is used to indicate field belonging to the word with markup information；First determination unit, for finding mesh in first database In the case where marking word, with the target word of target markup information, the participle of target text will be determined as in first database, Wherein, markup information includes target markup information, and target markup information is used to indicate field belonging to target word；Second determines Unit, the target for determining participle according to target markup information are semantic；Third determination unit, for the target language according to participle Justice determines the semanteme of target text.

In embodiments of the present invention, the target text identified to target voice is obtained；In first database, Search the target word in the word of target text, wherein first database is used to store the word with markup information, mark Information is used to indicate field belonging to the word with markup information；The case where target word is found in first database Under, with the target word of target markup information, the participle of target text will be determined as in first database, wherein mark letter Breath includes target markup information, and target markup information is used to indicate field belonging to target word；It is true according to target markup information Surely the target segmented is semantic；It is semantic according to the target of participle, determine the semanteme of target text.Since the participle of target text has For marking the markup information of participle fields, the semanteme of target text is determined, and then it is semantic to have reached guarantee target text The purpose correctly identified, overcome dictionary in the related technology based on full dose, will lead to that EMS memory occupation is big, light weight customized version Domanial words cover infull problem again, to reach the technical effect for improving the efficiency of semantics recognition, and then solve There is technical issues that semantics recognition in the related technology.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 is the schematic diagram according to one of the relevant technologies semantics recognition；

Fig. 2 is the schematic diagram identified according to one of the relevant technologies voice semantic platform；

Fig. 3 is a kind of schematic diagram of the hardware environment of method for recognizing semantics according to an embodiment of the present invention；

Fig. 4 is a kind of flow chart of method for recognizing semantics according to an embodiment of the present invention；

Fig. 5 is a kind of schematic diagram of Semantic interaction according to an embodiment of the present invention；

Fig. 6 is a kind of schematic diagram of semantics recognition system according to an embodiment of the present invention；

Fig. 7 is a kind of schematic diagram of field Lexicon Model according to an embodiment of the present invention；

Fig. 8 is a kind of schematic diagram of lexicographic tree according to an embodiment of the present invention；

Fig. 9 is the flow chart of the method for another semantics recognition according to an embodiment of the present invention；

Figure 10 is a kind of schematic diagram of semantic recognition device according to an embodiment of the present invention；And

Figure 11 is a kind of structural block diagram of electronic device according to an embodiment of the present invention.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.

It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.

According to an aspect of an embodiment of the present invention, a kind of embodiment of method for recognizing semantics is provided.

Optionally, in the present embodiment, above-mentioned method for recognizing semantics can be applied to as shown in Figure 3 by server 302 In the hardware environment constituted with terminal 304.Fig. 3 is a kind of hardware environment of method for recognizing semantics according to an embodiment of the present invention Schematic diagram.As shown in figure 3, server 302 is attached by network with terminal 304, above-mentioned network includes but is not limited to: wide Domain net, Metropolitan Area Network (MAN) or local area network, terminal 304 are not limited to PC, mobile phone, tablet computer etc..The semantic of the embodiment of the present invention is known Other method can be executed by server 302, can also be executed, be can also be by server 302 and terminal by terminal 304 304 common execution.Wherein, the method for recognizing semantics that terminal 304 executes the embodiment of the present invention is also possible to by mounted thereto Client executes.

Optionally, in the method for recognizing semantics of the embodiment of the present invention by terminal 304 to execute when, including step S31 to walk Rapid S33:

Step S31 determines the semanteme of the target text identified to target voice.

User issues target voice to terminal 304, and terminal 304 identifies the target voice got, obtains target Text determines the semanteme of target text.

The specific semantic treatment process for determining the target text identified to target voice is as follows:

Step S311 obtains the target text identified to target voice.

Step S312 searches the target word in the word of target text, wherein the first data in first database Library is used to store the word with markup information, and markup information is used to indicate field belonging to the word with markup information.

Step S313 in the case where finding target word in first database, will have target in first database The target word of markup information, is determined as the participle of target text.

Step S314 determines that the target of each participle is semantic according to target markup information.

Step S315, it is semantic according to the target of participle, determine the semanteme of target text.

Step S32 reports the semantic information including target text.

After determining the semanteme of target text identified to target voice, to server 302 report including The semantic information of target text, for example, reporting the semantic data for being used to indicate target text and at least one participle.

Step S33 updates first database.

By the database in the Advance data quality server 302 of the semanteme including target text reported, and then by connecing The data in the database in server 302 are received to update local first database, and then form feedback-optimization closed loop Implementation procedure.

Above-mentioned steps S31 to step S33 is a full implementation process for including technical scheme, the skill of the application Art scheme relates generally to step S31 wherein, and the technical solution of step S31 is described in detail below with reference to specific embodiment.

Fig. 4 is a kind of flow chart of method for recognizing semantics according to an embodiment of the present invention.As shown in figure 4, step S402 with Step S311 is corresponding, and step S404 is corresponding with step S312, and step S406 is corresponding with step S313, step S408 and step S314 Corresponding, step S410 is corresponding with step S315, and this method may comprise steps of:

Step S402 obtains the target text identified to target voice.

In the technical solution that the application above-mentioned steps S402 is provided, target voice is to be inputted by voice-input device Voice, for example, inputting target voice " navigating to Shenzhen University " by microphone apparatus.It is right after getting target voice Target voice is identified, target text corresponding with target voice is obtained, and carries out semantic understanding to the target text, that is, Semantics recognition is carried out to target text, this is to be identified to obtain the key link after target text to target voice.

Step S404 searches the target word in the word of target text in first database.

In the technical solution that the application above-mentioned steps S404 is provided, in first database, the word of target text is searched Target word in language, wherein first database is used to store the word with markup information, and markup information, which is used to indicate, to be had Field belonging to the word of markup information.

The first database of the embodiment can be used for storing the data in a large amount of multiple fields, for example, for storing Excavation applications, music field, map field, the data in communication field are runed, can also be stored and vehicle-mounted voice intercorrelation Other FIELD Datas, no limitations are hereby intended.The first database can be used as field Lexicon Model, the data packet of storage The word with markup information is included, which is used to indicate field belonging to the word with markup information.Optionally, have There is the word of markup information that just there is markup information when importing first database,

Optionally, the data in the first database of the embodiment can be synchronous with the holding of the data in the dictionary of cloud.It can After the data in dictionary update beyond the clouds, newest data to be updated into first database in backstage real-time edition, than Such as, after map datum, listening automatically updating data, by newest map datum, data is listened to update into first database, from And it ensure that the data in first database and carry out matched correctness.

The target text of the embodiment can be made of multiple words.In the mesh that acquisition identifies target voice After marking text, at least one word of target text is obtained, cutting can be carried out to target text, obtain at least one word Language optionally by carrying out cutting to target text based on dictionary, obtains at least one word, for example, " broadcasting to target text Trust too soft " cutting is carried out, it obtains " playing ", " heart is too soft " two words.In first database, the word of target text is searched Target word in language searches target word that is, selecting target word from the word of target text in first database Language in first database, is searched " heart is too soft " for example, " heart is too soft " to be determined as to the target word of target text.

Step S406 in the case where finding target word in first database, will have target in first database The target word of markup information, is determined as the participle of target text.

In the technical solution that the application above-mentioned steps S406 is provided, the case where target word is found in first database Under, with the target word of target markup information, the participle of target text will be determined as in first database.

The first database of the embodiment is used to store the word with markup information, if searched in first database To target word, then the target word found has target markup information, which is used to indicate target word Target word with target markup information is determined as the participle of target text by affiliated field, to realize to target text This word segmentation processing.

Word segmentation processing is carried out to target text by the embodiment, can identify complete name entity, for example, the One database is that the database for storing the data in map field " navigates to Shenzhen to target text in first database University " carries out word segmentation processing.It is found in first database " Shenzhen University ", rather than " Shenzhen " and " university ", and Shenzhen University has the markup information in geographical location, for example, the markup information in geographical location is NP, can completely show that " Shenzhen is big The physical contents of ".

Optionally, in first database, the target word " Nan Shannan " of the word in target text is searched, is obtained To the target word " Nan Shannan/NM " for having markup information, then " Nan Shannan/NM " is determined as to the participle of target sheet.Word " south The field that the markup information " NM " in mountain south " can be used for marking belonging to it is music field, then may further determine that " South Mountain South " is the song " Nan Shannan " in music field, rather than the address " Nan Shannan " in geographic territory.

Optionally, the participle of the target text of the embodiment can also have part-of-speech tagging, for example, having noun n, verb The part-of-speech taggings such as v.

Optionally, when which searches target word in first database, pass through Tire tree-efficient storage retrieval knot Structure searches target word, to greatly improve the search efficiency to target word.Finally it is re-introduced into even numbers group Tire tree Structure advanced optimizes the memory footprint of general T ire tree.Wherein, Tire tree is a kind of tree structure, because it is dictionary A kind of storage mode, therefore be called dictionary tree.Each of dictionary word shows as one in Tire tree and points out from root knot The path of hair, it is exactly a Tire tree that the point on the side of path, which links up,；Even numbers group Trie (Double-ArrayTrie) is Trie One of tree is simple and effectively realizes, is made of two integer arrays, is looked into even numbers group dictionary tree for each participle The mark of inquiry.

Step S408 determines that the target of participle is semantic according to target markup information.

In the technical solution that the application above-mentioned steps S408 is provided, will there is target markup information in first database Target word, be determined as after the participle of target text, determine that the target of participle is semantic according to target markup information.

The target markup information of the embodiment is used to indicate field belonging to target word, can be according to target markup information Determine that the target of participle is semantic, for example, target word is " Nan Shannan ", there are two types of meanings in the South Mountain south, and one is song " South Mountain South ", another kind is geographical location " Nan Shannan ".It, can be according to " NM " when the target markup information of target word is " NM " Determine that " Nan Shannan " belongs to the word in music field, then the target semanteme for segmenting " Nan Shannan/NM " is song " Nan Shannan "；When When the target markup information of target word is " NP ", the word in geographic territory can be belonged to according to " NP " determination " Nan Shannan " Language, then the target semanteme for segmenting " Nan Shannan/NP " is geographical location " Nan Shannan ".

Step S410, it is semantic according to the target of participle, determine the semanteme of target text.

In the technical solution that the application above-mentioned steps S410 is provided, target text is being determined according to target markup information After the target of participle is semantic, the target according to the participle of target text is semantic, determines the semanteme of target text, and then execute mesh Mark order indicated by the semanteme of text.For example, it is semantic according to the target of " navigation ", " arriving ", " Shenzhen University ", determine target text The semanteme of this " navigating to Shenzhen University " is to navigate to Shenzhen University, and then execute navigation command, realizes navigation purposes.

S402 to step S410 through the above steps obtains the target text identified to target voice；? In one database, the target word in the word of target text is searched, wherein first database has markup information for storing Word, markup information is used to indicate field belonging to the word with markup information；Target is found in first database In the case where word, with the target word of target markup information, the participle of target text will be determined as in first database, In, markup information includes target markup information, and target markup information is used to indicate field belonging to target word；According to target mark It infuses information and determines that the target of participle is semantic；It is semantic according to the target of participle, determine the semanteme of target text.Due to target text Participle determines the semanteme of target text, has reached guarantee target text with the markup information for marking participle fields The semantic purpose correctly identified has reached the technical effect for improving the efficiency of semantics recognition, and then has solved in the related technology There is technical issues that semantics recognition.

As an alternative embodiment, step S404 is searched in the word of target text in first database Target word includes: at least one word to be selected from the word of target text, and at least one word is determined as target word Language；Target word is searched in first database.

In this embodiment, the word of target text includes multiple words, for example, target text is " it is too soft to play the heart ", Then " needing to play the heart too soft " may include multiple words, for example, including " needs ", " broadcasting ", " heart is too soft " three words.From At least one word is selected in the word of target text, be can choose all words in target text, also be can choose target A word in multiple words of text, also can choose more than two words in the word of target text, for example, mesh Word " needs ", " broadcasting ", " heart is too soft " three words in text " needing to play the heart too soft " are marked, due to " needs ", " are broadcast Put " for verb, semanteme is specific, thus can only select " heart is too soft " from " needs ", " broadcastings ", " heart is too soft " this is a Word.At least one word selected from target text is determined as to the target word of target text, for example, by the word " heart It is too soft " it is determined as the target word of target text " needing to play the heart too soft ".At least one is selected in the word from target text A word, and after at least one word is determined as target word, target word is searched in first database, for example, the One database includes the data in music field, " heart is too soft " is searched in first database, if looked into first database " heart is too soft " is found, then the participle by " heart too soft/NM " with markup information as target text " needing to play the heart too soft ".

As an alternative embodiment, searching the target word in the word of target text in first database Before, this method further include: from the word with markup information obtained in server in multiple fields, wherein have mark The word of information marks above-mentioned markup information on the server；Word with markup information is imported in first database.

In this embodiment, first database can be used for storing the word with markup information in multiple fields, clothes Word in the available multiple fields of device of being engaged in, and the word in multiple fields is labeled, make the word in each field All there is markup information.Optionally, multiple fields can for operation excavation applications, music field, map field, communication field, School field etc., no limitations are hereby intended, the markup information on automatic marking on the server of the word in multiple fields, than Such as, server adds markup information to the target position of each word, can anterior locations to each word or back position Addition markup information is set, for example, " Nan Shannan/NM " is obtained to addition markup information " NM " behind word " Nan Shannan ", thus Realize the automatic marking to word.

From in server obtain multiple fields in the word with markup information after, by the word with markup information Language imports in first database, for example, importing the word with markup information runed in excavation applications to first database, leading Enter the word with markup information in map field, import the word for listening content with markup information in music field, And the word with local address book in importing communication field, phase is interacted with vehicle-mounted voice in addition to this it is possible to import The word etc. in other fields closed, no limitations are hereby intended.

As an alternative embodiment, step S404 is searched in the word of target text in first database Target word includes: to search the target word in the word of target text in the first database with dictionary tree construction, In, the word with markup information in multiple fields is distributed in the mulitpath of dictionary tree construction.

In this embodiment, dictionary tree is a kind of tree structure, is a kind of storage mode of dictionary.Each of dictionary Word shows as the path from root node in dictionary tree, and it is exactly a dictionary tree that the point on the side of path, which links up,. In the first database with dictionary tree construction, the target word in the word of target text, the word in dictionary tree are searched There is common prefix in dictionary tree, common prefix is stored in shared memory space, that is, the common prefix of dictionary data Can be with the communal space, magnanimity dictionary data recall precision is unrelated with dictionary item number, and ratio establishes Hash (Hash) table in this way, past from a left side Splicing word goes the scheme searched in Hash table more to save space again afterwards.There is mark in the multiple fields of the embodiment The word of information is distributed in the mulitpath of dictionary tree construction, so as to search participle in the path of dictionary tree.

Optionally, the dictionary tree of embodiment includes even numbers group dictionary tree, and even numbers group dictionary tree can subtract to a certain extent The waste of few memory.Even numbers group dictionary tree includes that two shaping arrays can be used to indicate that for example, one is array base [] The array of the base address of descendant node, its value are the base value of state transfer, another is check [], for identifying forerunner's section The address of point, is equivalent to check value, for checking that the state whether there is, the corresponding word of the state.If array index is I, each word can be indicated by the subscript of even numbers group.The each word of even numbers group is inquired in even numbers group dictionary tree Mark, if base [i], check [i] are 0, then it represents that the position, if base [i] is negative value, indicates the state to be empty To terminate state (i.e. word), check [i] indicates the previous state of the state.In the first number with even numbers group dictionary tree construction According to the target word in library, searched in the word of target text, query time of each word in even numbers group dictionary tree and every The length of a word is related, for example, when word length more in short-term, query time is shorter, when the length of word is longer, inquires Time is longer.

The even numbers group dictionary tree of the embodiment can be constructed as a kind of dynamic retrieval method, to solve to be inserted into and delete institute There are the problem of.For example, when being inserted into new base value, it is only necessary to traverse dummy status.One can be constructed to all dummy status A sequence only needs to scan the sequence when determining base value.The useless node generated when for deleting leaf node can incite somebody to action They are set to sky, so that being reused when being inserted into neologisms, after deleting a state, array end is likely to occur continuous Dummy status is also directly to delete, to advanced optimize memory footprint.

As an alternative embodiment, searching the target word in the word of target text in first database Before, this method further include: from server obtain multiple fields in the updated word with markup information；To the first number According to the updated word with markup information is added in library, updated first database is obtained；In first database, look into The target word looked in the word of target text includes: to search in the word of target text in first database in the updated Target word.

In this embodiment, the data in different field will be updated.In order to guarantee the accuracy of semantics recognition, the first data The word with markup information stored in library is also required to constantly update.From in server obtain multiple fields in updated tool There is the word of markup information, the word in the first database can be synchronous with the word in the dictionary of cloud, which deposits The word in multiple fields has been stored up, the word and cloud dictionary stored in first database can be realized by dictionary synchronization module In data it is synchronous.It obtains from server in multiple fields after the updated word with markup information, to first The updated word with markup information is added in database, obtains updated first database, for example, in product operation After listening automatically updating data, updated data are added into first database for the map datum on backstage, to be updated First database afterwards, and then in first database in the updated, the target word in the word of target text is searched, is guaranteed Word in subsequent first database can correctly be matched with the word in target text, improve the accuracy of semantics recognition.

Optionally, in this embodiment, for markup information for marking field belonging to word, markup information can also be more Newly.Updated data are being added into first database, it is available updated when obtaining updated first database Markup information.Optionally, markup information starts to be used to indicate the first field belonging to word, and is currently updated to be used to indicate word Second field belonging to language.For example, the first field can be able to be jazz field, mark with pop music field, the second field Note information starts to be used to indicate pop music field belonging to word, and is currently updated to be used to indicate jazz's sound belonging to word Happy field.After obtaining updated markup information, updated markup information is added into first database, to obtain Updated first database guarantees that the word in subsequent first database can correctly be matched with the word in target text, Improve the accuracy of semantics recognition.

As an alternative embodiment, updated with markup information in multiple fields from obtaining in server Word includes: from the word with markup information increased newly in acquisition multiple fields in server, wherein in first database not The newly-increased word with markup information of storage, the updated word with markup information include newly-increased with markup information Word；The updated word with markup information is added into first database, obtains updated first database packet It includes: adding the newly-increased word with markup information into first database, obtain updated first database.

In this embodiment, the data in different field will be updated, and the data of update can be to increase newly in different field Data.When from the updated word with markup information in multiple fields is obtained in server, obtained from server more In a field increase newly the word with markup information, and the newly-increased word with markup information be in first database from Not stored mistake, for example, the newly-increased word is the higher vogue word of an emerging temperature in fields, servicing Above-mentioned markup information is marked to newly-increased word on device.There is markup information from what is increased newly in acquisition multiple fields in server Word after, the newly-increased word with markup information is added into first database, obtains updated first database, For example, the newly-increased vogue word with markup information is added into first database, to obtain updated first data Library guarantees subsequent first database so that the content in first database is adapted to the update of word in multiple fields In word can correctly be matched with the word in target text, improve the accuracy of semantics recognition.

As an alternative embodiment, updated with markup information in multiple fields from obtaining in server Word include: from server obtain multiple fields in the modified word with markup information, wherein in first database The word with markup information before being stored with modification corresponding with the modified word with markup information, updated tool The word for having markup information includes the modified word with markup information；Adding into first database updated has The word of markup information, obtaining updated first database includes: to have mark before the modification that will be stored in first database The word for infusing information, replaces with the modified word with markup information, obtains updated first database.

In this embodiment, the data in different field will be updated, and the data of update can be in different field to original The data that data are modified.When from the updated word with markup information in multiple fields is obtained in server, from The modified word with markup information in multiple fields is obtained in server, and after being stored with and modifying in first database The corresponding modification of the word with markup information before the word with markup information, for example, it is modified have mark letter The word of breath is " Nan Shannan/NM ", and the word with markup information before modifying is " Nan Shannan/NP ".From server After obtaining the word with markup information increased newly in multiple fields, will there is mark before the modification stored in first database The word for infusing information, replaces with the modified word with markup information, obtains updated first database, for example, will The word " Nan Shannan/NP " with markup information before the modification stored in first database replaces with modified with mark The word " Nan Shannan/NM " for infusing information, so that updated first database is obtained, so that the content in first database It is adapted to the update of word in multiple fields, guarantees that the word in subsequent first database can be with the word in target text Correct matching, improves the accuracy of semantics recognition.

As an alternative embodiment, step S404 is searched in the word of target text in first database Target word includes: a variety of division results for obtaining and being divided to target text, wherein every kind of division result is by target The word of text forms；The target division result for meeting goal rule is determined in a variety of division results；In first database, Search the target word in the word of target division result.

In this embodiment, target text is divided, obtains a variety of division results, it can be by decision model to mesh Mark text is divided.Goal rule is imported to the decision model, which is the rule divided to target text. The a variety of division results divided to target text are obtained, for example, target text is " it is too soft to play the heart ", can be based on Three adjacent words of dictionary cutting " play the heart too soft ", obtain a variety of division results " broadcasting _ heart _ too soft ", " broadcasting _ heart _ too ", " broadcast _ trust _ too soft ", " broadcast _ trust _ too ", " broadcast _ put _ heart ", every kind of division result is made of the word of target text.It is obtaining It takes after carrying out a variety of division results that word segmentation processing obtains to target text, is determined from a variety of division results and meet target rule Target division result then, for example, determining that length is maximum, average length is maximum, word variation is minimum, single from a variety of division results The division result for meeting the goal rule is determined as target division result, and then obtained by the highest division result of free morpheme degree It takes at least one of the target division result to segment, the markup information of each participle is determined in first database, according to mark Note information determines that target of each participle in target domain is semantic, finally semantic according to the target of each participle, determines target The semanteme of text, to reach the technical effect for improving the efficiency of semantics recognition.

As an alternative embodiment, determining the target division result for meeting goal rule in a variety of division results It include: to obtain the sum of length of all words in every kind of division result in a variety of division results and determine all words Maximum first division result of the sum of length, wherein the quantity of the first division result is the first quantity；It is 1 in the first quantity In the case of, the first division result is determined as to meet the target division result of goal rule；The case where the first quantity is not 1 Under, in the first division result of the first quantity, the average length of all words in every kind of first division result of acquisition is simultaneously true Maximum second division result of average length of fixed all words, wherein the quantity of the second division result is the second quantity, second Quantity is less than or equal to the first quantity；In the case where the second quantity is 1, it is determined as the second division result to meet goal rule Target division result；In the case where the second quantity is not 1, in the second division result of the second quantity, every kind second is obtained The long amplitude of variation of word of all words in division result simultaneously determines that the smallest third of the long amplitude of variation of word of all words divides As a result, wherein the quantity of third division result is third quantity, and third quantity is less than or equal to the second quantity；It is 1 in third quantity In the case where, third division result is determined as to meet the target division result of goal rule；The case where third quantity is not 1 Under, in the third division result of third quantity, obtain the free morpheme degree, simultaneously of all words in every kind of third division result Determining highest 4th division result of the free morpheme degree of all words, wherein the quantity of the 4th division result is the 4th quantity, 4th quantity is less than or equal to third quantity, and free morpheme degree is used to indicate word and morpheme constitutes the probability of new word；? In the case that four quantity are 1, the 4th division result is determined as to meet the target division result of goal rule.

In this embodiment, when determination meets the target division result of goal rule from a variety of division results, more In kind division result, obtains the sum of length of all words in every kind of division result and determine the sum of the length of all words Maximum first division result, the quantity of the first division result are the first quantity；In the case where the first quantity is 1, by first Division result is determined as meeting the target division result of goal rule.

For example, in a variety of division results, the sum of the length of all words in every kind of division result is obtained, than Such as, a variety of division results be " broadcasting _ heart _ too soft ", " broadcasting _ heart _ too ", " broadcast _ trust _ too soft ", " broadcast _ trust _ too ", " broadcast _ Put _ the heart ", wherein " broadcasting _ heart _ too soft " has 3 words " broadcasting ", " heart ", " too soft ", and the length summation of this 3 words is 5, " broadcasting _ heart _ too " there are 3 words " broadcasting ", " heart ", " too ", the length summation of this 3 words is 4, " broadcast _ trust _ too soft " have 3 words " broadcasting ", " trusting ", " too soft ", the length summation of this 3 words are 5, " broadcast _ trust _ too " have 3 words " broadcasting ", " trusting ", " too ", the length summation of this 3 words are 4, and " broadcasting _ put _ heart " has 3 words " broadcasting ", " putting ", " hearts ", this 3 points The length summation of word is 3.Determine maximum first division result of the sum of length of all words, due to division result " play _ The heart _ too soft " and " broadcast _ trusts _ it is too soft " length be 5, thus above-mentioned first division result be " broadcasting _ heart _ too soft " and " broadcast _ trust _ too soft ", the first quantity of first division result is 2, is not 1, can't determine target division result.

In the case where the first quantity is not 1, in the first division result of the first quantity, every kind first division knot is obtained The average length of all words in fruit and maximum second division result of average length for determining all words, wherein second The quantity of division result is the second quantity, and the second quantity is less than or equal to the first quantity；In the case where the second quantity is 1, by the Two division results are determined as meeting the target division result of goal rule.

For example, in the first division result of the first quantity, all participles in every kind of first division result are obtained Average length, for example, " broadcasting _ heart _ too soft " has 3 participles, the average length of each participle is 1.667 (5/3= 1.667), " broadcast _ trust _ too soft " has 3 participles, and the average length of each participle is 1.667 (5/3=1.667), " play _ The heart _ too soft " and " broadcast _ trusts _ it is too soft " average length be 1.667, thus, " broadcasting _ heart _ too soft " and " broadcast _ trust _ too It is soft " it is the second division result, the second quantity is 2, is not 1, can't determine target division result.

In the case where the second quantity is not 1, in the second division result of the second quantity, every kind second division knot is obtained The long amplitude of variation of word of all words in fruit and the smallest third division result of the long amplitude of variation of word for determining all words, Wherein, the quantity of third division result is third quantity, and third quantity is less than or equal to the second quantity；The feelings for being 1 in third quantity Under condition, third division result is determined as to meet the target division result of goal rule.

For example, in the second division result of the second quantity, all participles in every kind of second division result are obtained The long amplitude of variation of word, the long amplitude of variation of the word are the long standard deviation of the word of participle.For the second division result " broadcasting _ heart _ too soft ", There are 3 participles " broadcasting ", " heart ", " too soft ", the long amplitude of variation of word is For the second division result " broadcast _ trust _ too soft ", there are 3 participles " broadcasting ", " trusting ", " too soft ", the long amplitude of variation of word isDue to the second division result " broadcasting _ heart _ too soft " and " broadcast _ Trust _ it is too soft " the long amplitude of variation of word be 0.577, the smallest third division result of the long amplitude of variation of the word of all participles is " broadcasting _ heart _ too soft " and " broadcast _ trust _ too soft ", the third quantity of third division result is 2, is not 1, can't determine mesh Mark division result.

In the case where third quantity is not 1, in the third division result of third quantity, obtains every kind of third and divide knot The free morpheme degree of all words in fruit and highest 4th division result of free morpheme degree for determining all words, wherein The quantity of 4th division result is the 4th quantity, and the 4th quantity is less than or equal to third quantity, and free morpheme degree is used to indicate word The probability of new word is constituted with morpheme；In the case where the 4th quantity is 1, the 4th division result is determined as to meet target rule Target division result then.

For example, in the third division result of third quantity, all participles in every kind of third division result are obtained Free morpheme degree, for example, obtaining third division result respectively is " broadcasting _ heart _ too soft " and the freedom of " broadcast _ trust _ too soft " Morpheme degree, free morpheme be can separate words, the morpheme of word can be also combined into other morpheme, free morpheme degree is used for It indicates that participle constitutes the probability of word, can be indicated with natural logrithm.Wherein, third division result is the " broadcasting _ heart _ too It is soft " length be 5, average length 1.667, the long amplitude of variation of word be 0.577, natural logrithm calculated result be 13.0072, Third division result be the length of " broadcast _ trust _ too soft " be 5, average length 1.667, the long amplitude of variation of word is 0.577, Natural logrithm calculated result is 10.1699.Thus third division result be " broadcasting _ heart _ too soft " individual character free morpheme degree most Height, is the 4th division result, and quantity 1 then is determined as the 4th division result " broadcasting _ heart _ too soft " to meet goal rule Target division result, and then obtain at least one of target division result participle, determine the target semanteme number of each participle According to, it is finally semantic according to the target of each participle, the semanteme of target text is determined, to reach the efficiency for improving semantics recognition Technical effect.

Optionally, if the 4th quantity is not 1, a kind of draw can be randomly choosed from the 4th division result of the 4th quantity Divide result as target division result, and then in first database, search the target word in the word of target division result, In the case where finding target word in first database, will there is the target word of target markup information in first database Language is determined as the participle of target text, determines that the target of participle is semantic according to target markup information, according to the target language of participle Justice determines the semanteme of target text.If the semanteme inaccuracy of target text, user can feed back this time language to target text The opinion for the result that justice is identified, and the correct semanteme thought.The data of user feedback are obtained, and on server Report, product and operation personnel can limit the division result that terminal is made according to the data fed back on line, to reduce Next time selects a kind of probability of the above-mentioned division result being randomly selected as target division result.

Optionally, if the 4th quantity is not 1, the 4th division result of the 4th quantity all can first be retained, is then gone The word in each 4th division result is searched in first database, once first looked for out in first database which the 4th Word in division result, and the semanteme of target text is determined at first, just give up to fall other 4th division results, that is, The data searched in other 4th division results are stopped in first database.

As an alternative embodiment, it is semantic in the target according to participle, it, should after the semanteme for determining target text Method further include: the semantic data for being used to indicate target text are reported to server, wherein be used to indicate the language of target text The data of justice are used to update the second database on server, and the second database is used to store the word in multiple fields；Pass through Second database update first database.

In this embodiment, semantic in the target according to participle, it, can be to server after the semanteme for determining target text At least one participle for being used to indicate the semantic data and target text of target text is reported, there are the second data on server Library, second database namely cloud dictionary, for storing the semantic data in target domain, by being used to indicate target text Semantic data and at least one participle update second database, and then target text will be used to indicate by the second database Semantic data and at least one participle imported into first database, update of the realization to first database, to guarantee The correct matching of semantic data in first database, improves the efficiency of semantics recognition.

Optionally, the local offline voice Words partition system of the embodiment set and online FIELD Data dynamic increment update Mode, to guarantee that mobile unit (garage, remote districts) under no net environment can also carry out voice semantics recognition.And in network In good situation, local offline domain model library can be automatically updated.

As an alternative embodiment, step S402, obtains the target text identified to target voice It include: the target text for obtaining and being identified to the target voice that mobile unit or voice-input device receive.

The method for recognizing semantics of the embodiment can be adapted in vehicle-mounted semantic scene, for example, being useful in vehicle-mounted hardware In environment.It is available defeated to mobile unit or voice in the target text that acquisition identifies target voice Entering the target text that the target voice that equipment receives is identified, wherein voice-input device can be microphone etc., Mobile unit may include voice-input device, to improve the efficiency of semantics recognition under vehicle-mounted scene.

Technical solution of the present invention is illustrated below with reference to preferred embodiment.

Fig. 5 is a kind of schematic diagram of Semantic interaction according to an embodiment of the present invention.As shown in figure 5, can be defeated by voice Enter equipment and obtain target voice, for example, target voice is " it is too soft to play the heart " in music field.Target voice is identified as mesh Text is marked to realize speech recognition, then identifies the semanteme of target text, that is, carrying out semantic understanding, ultimately produces target text Order indicated by this semanteme executes the order by player, realizes and is intended to execute, for example, being broadcast by player execution Too soft order is trusted, to realize voice service.

Below to the whole design framework of the semantic participle improved method of the online field labeled data of the embodiment of the present invention It is introduced.

Fig. 6 is a kind of schematic diagram of semantics recognition system according to an embodiment of the present invention.As shown in fig. 6, the system includes: Field Lexicon Model, word segmentation module, decision model, semantic analytic modell analytical model, dictionary synchronization module etc..

The field Lexicon Model of the embodiment is introduced below.

The embodiment can import data, map point of interest (the Point of that operation is excavated to field Lexicon Model Interest, referred to as POI) more new data, listen the recommending data of content and the address book data etc. of local.Except this it Outside, other FIELD Datas with vehicle-mounted voice intercorrelation can also be imported to field Lexicon Model.

Fig. 7 is a kind of schematic diagram of field Lexicon Model according to an embodiment of the present invention.As shown in fig. 7, importing vehicle-mounted neck Numeric field data can be imported by way of part-of-speech tagging+name entity.The embodiment can import operation and excavate upper new function Can, order cmd.dict is imported, order cmd.dict may include turn on/and VA_launch, screen/NC_screen, I think Remove/VP；Map platform urban renewal data are imported, for example, importing point of interest poi.dict, point of interest poi.dict be can wrap Include Window on the World/NP, Guangzhou/NP, Tibet/NP/NM, wherein NP is used to indicate map field belonging to participle, and NM is used to indicate Music field belonging to participle；Music, radio station recommending data are imported, listens content ting.dict for example, importing, this listens content Ting.dic include Nan Shannan/NM, the heart it is too soft/NM, logical thinking/NR；Local address book is imported, for example, address book importing Person.dict, address list person.dict include father/NC/NM, Zhou Jielun NC/NS, Zhang San/NC, wherein NC is used for Indicate participle belonging to address list, NS be used to indicate participle belonging to star field.It is above-mentioned being imported into field database After data, dictionary can be carried out and expand words_Xxx.dict.Wherein, VA, NC, VP, NP, NM, NC, NS are to be used to indicate The markup information for segmenting fields, can pre-define.

The field Lexicon Model of the embodiment can search in real time so that participle matching effect, semantic understanding obtain it is excellent Change, more rationally, for example, input " Nan Shannan ", matches phrase " Nan Shannan " in the Lexicon Model of field, rather than phrase " south Mountain _ south " carries out semantic parsing, and can be parsed out " Nan Shannan " is music/NM, rather than address/NP, can content be known Rate is not increased to 81.5% by original 49.5%, improves the efficiency of semantics recognition.

The word segmentation module of the embodiment is introduced below.

The word segmentation module of the embodiment can based on dictionary be carried out in the Lexicon Model of field efficiently segment inquire, due to It imports after the mass data of hundreds of thousands rank, since vehicle environment hardware performance is powerful not as good as server-side, so in the participle Module introduces Tire tree-efficient storage index structure, can greatly improve the participle of the embodiment in the Lexicon Model of field The efficiency of inquiry.Even numbers group Tire tree construction is finally introducing to advanced optimize the memory footprint of general T ire number.

Fig. 8 is a kind of schematic diagram of lexicographic tree according to an embodiment of the present invention.As shown in figure 8, by knowing to target voice The target text not obtained is " navigating to western beautiful ", needs the quick-searching in magnanimity FIELD Data, be syncopated as " navigating ", " arriving ", " western beautiful " phrase.

Include in " navigator " path of the lexicographic tree of the embodiment " navigation ", in " arrival " path including " arriving ", " western beautiful Include in the path of lake " " western beautiful ".Magnanimity dictionary data recall precision is unrelated with dictionary item number, and the common prefix of dictionary data can be with The communal space removes the scheme searched in Hash table more save space from the left word of splicing backward, for example, right than establishing Hash table again It is about 60M in the memory of the scheme of the Hash table of 100,000 dictionaries, and the memory of dictionary tree scheme is about 45M, and memory reduces 25%.

In addition, the space utilization rate of dictionary tree is low, but algorithm is simple, and even numbers group dictionary tree can save 17% memory, excellent The memory footprint for changing common dictionary tree, can be used state-transition matrix algorithm, the algorithm is more complex.

The decision model of the embodiment is introduced below.

The decision model of the embodiment can by importing word segmentation regulation, along with above-mentioned online field Lexicon Model and Word segmentation module constitutes the participle improved method based on online FIELD Data of the embodiment.Fig. 9 is according to embodiments of the present invention Another semantics recognition method flow chart.As shown in figure 9, method includes the following steps:

Step S901, the target text that cutting identifies target voice.

After obtaining target voice, the target text that cutting identifies target voice can be based on dictionary cutting Three adjacent words.For example, being three adjacent words by " it is too soft to play the heart " cutting recognized to target voice.

1 first participle result table of table

Number	Word segmentation result	Length
			0	Broadcasting _ the heart _ too soft	5
1	Broadcasting _ the heart _ too	4
			2	Broadcast _ trust _ too soft	5
3	Broadcast _ trust _ too	4
			4	Broadcast _ put _ heart	3

Table 1 is the first participle result table according to the embodiment of the present invention.As shown in table 1, which includes 5 points Word result " broadcasting _ heart _ too soft ", " broadcasting _ heart _ too ", " broadcast _ trust _ it is too soft ", " broadcast _ trust _ too ", " broadcast _ put _ heart ", it is corresponding Length be respectively 5,4,5,4,3.

Step S902 imports word segmentation regulation to decision model.

After the target text that cutting identifies target voice, available multiple word segmentation results, to decision model Type imports word segmentation regulation, determines final word segmentation result.

2 second word segmentation result table of table

Number	Word segmentation result	Length
			0	Broadcasting _ the heart _ too soft	5
2	Broadcast _ trust _ too soft	5

Table 2 is the second word segmentation result table according to an embodiment of the present invention.As shown in table 2, length maximum is first chosen from table 1 Word segmentation result, Max ((x1)+(x2)+...+(xn)), wherein x1 is used to indicate the length of the 1st participle, and x2 is for expression the The length of 2 participles, xn are used to indicate the length of n-th of participle in word segmentation result.Due to " broadcasting _ heart _ too soft ", " broadcast _ it puts The length of the heart _ too soft " longest in all word segmentation results, is 5, thus filter out word segmentation result " broadcasting _ heart _ too soft ", " broadcast _ trust _ too soft ".

3 third word segmentation result table of table

Number	Word segmentation result	Length	Average length
				0	Broadcasting _ the heart _ too soft	5	1.667
2	Broadcast _ trust _ too soft	5	1.667

Table 3 is third word segmentation result table according to an embodiment of the present invention.As shown in table 3, word segmentation result shown in computational chart 2 Average length, avg (x1+x2+ ...+xn).For example, " broadcasting " length in word segmentation result " broadcasting _ heart _ too soft " is 2, " heart " Length be 1, the length of " too soft " is 2, then its average length is (3+1+2)/3=1.667；Word segmentation result " broadcast _ trust _ too It is soft " in " broadcasting " length be 1, the length of " trusting " is 2, and the length of " too soft " is 2, then its average length is (1+2+2)/3= 1.667。

The 4th word segmentation result table of table 4

Number	Word segmentation result	Length	Average length	Standard deviation
					0	Broadcasting _ the heart _ too soft	5	1.667	0.577
2	Broadcast _ trust _ too soft	5	1.667	0.577

Table 4 is the 4th word segmentation result table according to an embodiment of the present invention.As shown in table 4, word segmentation result shown in computational chart 4 Standard deviation, which can be used for reflecting the long amplitude of variation of the word in word segmentation result, sqrt (((x1-x) ^2+ (x2-x) ^2+ ... (xn-x) ^2)/(n-1)), wherein x is for indicating average length.For example, the mark of word segmentation result " broadcasting _ heart _ too soft " Quasi- difference isThe standard deviation of word segmentation result " broadcast _ trust _ too soft " For

The 5th word segmentation result table of table 5

Table 5 is the 5th word segmentation result table according to an embodiment of the present invention.As shown in table 5, word segmentation result shown in computational chart 5 Natural logrithm, which can be used for reflecting single free morpheme degree, chooses the highest word segmentation result of single free morpheme degree. " broadcasting " in the first round " broadcasting _ heart _ too soft " wins, and subsequent " heart _ too soft " similarly wins, and " broadcasts that is, choosing word segmentation result Put _ heart _ too soft " as the target word segmentation result for target text, that is, the optimal participle as target text combines.

Step S903, the corresponding order of performance objective word segmentation result.

The corresponding order of performance objective word segmentation result, for example, playing " heart is too soft " corresponding song.

The semantic analytic modell analytical model of the embodiment is introduced below.

Semantic analytic modell analytical model carries out semantic parsing to the participle obtained by word segmentation module, can be in semantic template Policy Table Determine the semanteme of participle, semantic template Policy Table includes participle fields, executes order, field markup information.Passing through After semantic analytic modell analytical model parses the semanteme of participle, application is called, to execute execution order indicated by the semanteme of participle.It should Semantic analytic modell analytical model can also report information to beacon database, for example, report the corresponding realm information of participle, execute order, Status information etc..

The dictionary synchronization module of the embodiment is introduced below.

The dictionary synchronization module of the embodiment operates in the local of mobile unit, for keeping local dictionary and cloud dictionary Data synchronize always.After backstage editor or map, listening automatically updating data, dictionary synchronization module will be counted for product operation According to incremental update to local, then newest FIELD Data imported into Lexicon Model, guarantees the content in subsequent Lexicon Model It can correctly match.

The user data of the embodiment is reported below and backstage OA operation analysis is introduced.

The embodiment uses " feedback-optimization " closed loop, relies on reporting module, and user is used to the mistake of semantic understanding and participle Behavior and result data in journey be reported in beacon database, to facilitate product and operation personnel to track, analytical line Upper problem and performance, for example, analytic function instructs missing, semantic intentional error, distribution to execute failure, and then comb to dictionary Reason and editor, optimize cloud dictionary, and cloud dictionary may include the word in map field, also may include in music field Word, and then in the way of the mode to dispatch from the factory preset and incremental update, field Lexicon Model is updated by dictionary synchronization module.

It is shown and is solved the problems, such as on following several voice semantemes by the technical effect that above method can achieve.Instruction Missing, for example, user says too hot, corresponding semanteme should be " air-conditioner temperature is turned down ", rather than unknown；Intentional error, than Such as, user, which says, goes to mountain top with me, and corresponding voice should be " playing song ", rather than navigate；Content missing, for example, user Say years highly skilled thief, corresponding semanteme should be " latest music for playing the son of tomorrow ", rather than film or other it is unknown in Hold；Function renewal, for example, vehicle-mounted function new on voice, user says hair message to certain good friend, and traditional method needs more newspeak The support function to message semantics is added in sound assistant.And the method for using the embodiment, it is only necessary to the good corresponding life of Configuration Online Dictionary and order semantic tagger are enabled, new function can be supported by updating by cloud synchronization module to local.

The mode that the local offline voice Words partition system of the embodiment set and online FIELD Data dynamic increment update, from And voice semantics recognition can also be carried out by guaranteeing mobile unit (garage, remote districts) under certain no net offline environments.And Local offline domain model library is automatically updated in the good situation of network state.

The offline semantic model of the voice platform of the embodiment can not have the ability of online updating, can be next big Full dose updates local semantic model when version is issued.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.

Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.

According to another aspect of an embodiment of the present invention, it additionally provides a kind of for implementing the semanteme of above-mentioned method for recognizing semantics Identification device.Figure 10 is a kind of schematic diagram of semantic recognition device according to an embodiment of the present invention.As shown in Figure 10, which can To include: acquiring unit 10, searching unit 20, the first determination unit 30, the second determination unit 40 and third determination unit 50.

Acquiring unit 10, for obtaining the target text identified to target voice.

Searching unit 20, for searching the target word in the word of target text in first database, wherein the One database is used to store the word with markup information, and markup information is used to indicate neck belonging to the word with markup information Domain.

First determination unit 30, in the case where for finding target word in first database, by first database In with target markup information target word, be determined as the participle of target text, wherein markup information include target mark letter Breath, target markup information are used to indicate field belonging to target word.

Second determination unit 40, the target for determining participle according to target markup information are semantic.

Third determination unit 50, it is semantic for the target according to participle, determine the semanteme of target text.

Optionally, searching unit comprises determining that module and searching module.Wherein it is determined that module, for from target text At least one word is selected in word, and at least one word is determined as target word；Searching module, in the first data Target word is searched in library.

It should be noted that the acquiring unit 10 in the embodiment can be used for executing the step in the embodiment of the present application S402, the searching unit 20 in the embodiment can be used for executing the step S404 in the embodiment of the present application, in the embodiment First determination unit 30 can be used for executing the step S406 in the embodiment of the present application, the second determination unit 40 in the embodiment It can be used for executing the step S408 in the embodiment of the present application, the third determination unit 50 in the embodiment can be used for executing sheet Apply for the step S410 in embodiment.

The embodiment obtains the target text identified to target voice by acquiring unit 10, by searching for list Member searches the target word in the word of target text in first database, wherein first database has mark for storing The word of information is infused, markup information is used to indicate field belonging to the word with markup information, passes through the first determination unit 30 In the case where finding target word in first database, will there is the target word of target markup information in first database Language is determined as the participle of target text, wherein markup information includes target markup information, and target markup information is used to indicate mesh Field belonging to word is marked, determines that the target of participle is semantic according to target markup information by the second determination unit 40, by the Three determination units 50 are semantic according to the target of participle, determine the semanteme of target text.It is used for since the participle of target text has The markup information of mark participle fields, determines the semanteme of target text, and then has reached and guaranteed that target text is semantic just The purpose really identified overcomes dictionary in the related technology based on full dose, will lead to that EMS memory occupation is big, neck of light weight customized version Domain word covers infull problem again, to reach the technical effect for improving the efficiency of semantics recognition, and then solves correlation There is technical issues that semantics recognition in technology.

Herein it should be noted that example and application scenarios phase that said units and module are realized with corresponding step Together, but it is not limited to the above embodiments disclosure of that.It should be noted that above-mentioned module can be transported as a part of of device Row can also pass through hardware realization by software realization in hardware environment as shown in Figure 3.Wherein, hardware environment packet Include network environment.

According to another aspect of an embodiment of the present invention, it additionally provides a kind of for implementing the electronics of above-mentioned method for recognizing semantics Device.

Figure 11 is a kind of structural block diagram of electronic device according to an embodiment of the present invention.As shown in figure 11, the electronics dress being somebody's turn to do Setting may include: one or more (one is only shown in figure) processors 111, memory 113.Optionally, as shown in figure 11, should Electronic device can also include transmitting device 115.

Wherein, memory 113 can be used for storing software program and module, such as the semantics recognition side in the embodiment of the present invention Method and the corresponding program instruction/module of device, processor 111 by the software program that is stored in memory 113 of operation and Module realizes above-mentioned method for recognizing semantics thereby executing various function application and data processing.Memory 113 can wrap Include high speed random access memory, can also include nonvolatile memory, as one or more magnetic storage device, flash memory or Other non-volatile solid state memories of person.In some instances, memory 113 can further comprise remote relative to processor 111 The memory of journey setting, these remote memories can pass through network connection to electronic device.The example of above-mentioned network include but It is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.

Above-mentioned transmitting device 115 is used to that data to be received or sent via network, can be also used for processor with Data transmission between memory.Above-mentioned network specific example may include cable network and wireless network.In an example, Transmitting device 115 includes a network adapter (Network Interface Controller, NIC), can pass through cable It is connected with other network equipments with router so as to be communicated with internet or local area network.In an example, transmission dress 115 are set as radio frequency (Radio Frequency, RF) module, is used to wirelessly be communicated with internet.

Wherein, specifically, memory 113 is for storing application program.

The application program that processor 111 can call memory 113 to store by transmitting device 115, to execute following steps It is rapid:

Obtain the target text identified to target voice；

In first database, the target word in the word of target text is searched, wherein first database is for storing Word with markup information, markup information are used to indicate field belonging to the word with markup information；

In the case where finding target word in first database, will there is target markup information in first database Target word is determined as the participle of target text, wherein markup information includes target markup information, and target markup information is used for Indicate field belonging to target word；

Determine that the target of participle is semantic according to target markup information；

It is semantic according to the target of participle, determine the semanteme of target text.

Processor 111 is also used to execute following step: obtaining at least one word of target text；In first database In, search at least one word；The feelings of markup information are had been marked in first database at least one word found Under condition, it will be labeled at least one word of markup information, is determined as at least one participle of target text.

Processor 111 is also used to execute following step: at least one word is selected from the word of target text, and near A few word is determined as target word；Target word is searched in first database.

Processor 111 is also used to execute following step: in first database, searching the target in the word of target text Before word, from the word with markup information obtained in server in multiple fields, wherein the word with markup information Markup information is marked on the server；Word with markup information is imported in first database.

Processor 111 is also used to execute following step: in the first database with dictionary tree construction, searching target text Target word in this word, wherein the word with markup information in multiple fields is distributed in the more of dictionary tree construction In paths.

Processor 111 is also used to execute following step: in first database, searching the target in the word of target text Before word, from server obtain multiple fields in the updated word with markup information；Add into first database Add the updated word with markup information, obtains updated first database；In first database in the updated, look into Look for the target word in the word of target text.

Processor 111 is also used to execute following step: marking letter from newly-increased having in multiple fields is obtained in server The word of breath, wherein the not stored newly-increased word with markup information in first database, it is updated that there is markup information Word include the newly-increased word with markup information；The newly-increased word with markup information is added into first database Language obtains updated first database.

Processor 111 is also used to execute following step: modified with mark in multiple fields from obtaining in server The word of information, wherein before being stored with modification corresponding with the modified word with markup information in first database Word with markup information, the updated word with markup information include the modified word with markup information； By the word with markup information before the modification stored in first database, the modified word with markup information is replaced with Language obtains updated first database.

Processor 111 is also used to execute following step: a variety of division results divided to target text are obtained, Wherein, every kind of division result is made of the word of target text；The target for meeting goal rule is determined in a variety of division results Division result；In first database, the target word in the word of target division result is searched.

Processor 111 is also used to execute following step: in a variety of division results, obtaining all in every kind of division result The sum of length of word and maximum first division result of the sum of the length for determining all words, wherein the first division result Quantity is the first quantity；In the case where the first quantity is 1, the target that the first division result is determined as meeting goal rule is drawn Divide result；In the case where the first quantity is not 1, in the first division result of the first quantity, every kind first division knot is obtained The average length of all words in fruit and maximum second division result of average length for determining all words, wherein second The quantity of division result is the second quantity, and the second quantity is less than or equal to the first quantity；In the case where the second quantity is 1, by the Two division results are determined as meeting the target division result of goal rule；In the case where the second quantity is not 1, in the second quantity The second division result in, the long amplitude of variation of word for obtaining all words in every kind of second division result simultaneously determines all words The smallest third division result of the long amplitude of variation of the word of language, wherein the quantity of third division result is third quantity, third quantity Less than or equal to the second quantity；In the case where third quantity is 1, third division result is determined as to meet the target of goal rule Division result；In the case where third quantity is not 1, in the third division result of third quantity, obtains every kind of third and divide As a result the free morpheme degree of all words in and highest 4th division result of free morpheme degree for determining all words, In, the quantity of the 4th division result is the 4th quantity, and the 4th quantity is less than or equal to third quantity, and free morpheme degree is used to indicate word Language and morpheme constitute the probability of new word；In the case where the 4th quantity is 1, it is determined as the 4th division result to meet target The target division result of rule.

Processor 111 is also used to execute following step: it is semantic in the target according to participle, determine the semanteme of target text Afterwards, the semantic data for being used to indicate target text are reported to server, wherein be used to indicate the semantic data of target text For updating the second database on server, the second database is used to store the word in multiple fields；Pass through the second data Library updates first database.

Processor 111 is also used to execute following step: obtaining the mesh received to mobile unit or voice-input device The target text that poster sound is identified.

In embodiments of the present invention, the target text identified to target voice is obtained；In first database, Search the target word in the word of target text, wherein first database is used to store the word with markup information, mark Information is used to indicate field belonging to the word with markup information；The case where target word is found in first database Under, with the target word of target markup information, the participle of target text will be determined as in first database, wherein mark letter Breath includes target markup information, and target markup information is used to indicate field belonging to target word；It is true according to target markup information Surely the target segmented is semantic；It is semantic according to the target of participle, determine the semanteme of target text.Since the participle of target text has For marking the markup information of participle fields, determines the semanteme of target text, reached and guaranteed that target text is semantic just The purpose really identified has reached the technical effect for improving the efficiency of semantics recognition, and then solves and there is semanteme in the related technology The low technical problem of recognition efficiency.

Optionally, the specific example in the present embodiment can be with reference to example described in above-described embodiment, the present embodiment Details are not described herein.

It will appreciated by the skilled person that structure shown in Figure 11 is only to illustrate, electronic device can be intelligence Mobile phone (such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device (Mobile Internet Devices, MID), the electronic devices such as PAD.Figure 11 it does not cause to limit to the structure of above-mentioned electronic device.Example Such as, electronic device may also include than shown in Figure 11 more perhaps less component (such as network interface, display device) or With the configuration different from shown in Figure 11.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing the relevant hardware of electronic device by program, which can store in a computer readable storage medium In, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), disk or CD etc..

The embodiments of the present invention also provide a kind of storage mediums.Optionally, in the present embodiment, above-mentioned storage medium can With the program code for executing method for recognizing semantics.

Optionally, in the present embodiment, above-mentioned storage medium can be located at multiple in network shown in above-described embodiment On at least one network equipment in the network equipment.

Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps:

Obtain the target text identified to target voice；

Optionally, storage medium is also configured to store the program code for executing following steps: from target text At least one word is selected in word, and at least one word is determined as target word；Target is searched in first database Word.

Optionally, storage medium is also configured to store the program code for executing following steps: in first database In, before searching the target word in the word of target text, there is markup information from being obtained in multiple fields in server Word, wherein mark markup information on the server with the word of markup information；Word with markup information is imported In first database.

Optionally, storage medium is also configured to store the program code for executing following steps: with dictionary tree In the first database of structure, the target word in the word of target text is searched, wherein there is mark letter in multiple fields The word of breath is distributed in the mulitpath of dictionary tree construction.

Optionally, storage medium is also configured to store the program code for executing following steps: in first database In, it is updated with mark in multiple fields from being obtained in server before searching the target word in the word of target text Infuse the word of information；The updated word with markup information is added into first database, obtains updated first number According to library；In first database in the updated, the target word in the word of target text is searched.

Optionally, storage medium is also configured to store the program code for executing following steps: obtaining from server Take the word with markup information increased newly in multiple fields, wherein not stored newly-increased having marks letter in first database The word of breath, the updated word with markup information include the newly-increased word with markup information；To first database The newly-increased word with markup information of middle addition, obtains updated first database.

Optionally, storage medium is also configured to store the program code for executing following steps: obtaining from server Take the modified word with markup information in multiple fields, wherein being stored in first database has with modified The word with markup information before the corresponding modification of the word of markup information, the updated word with markup information include The modified word with markup information；By the word with markup information before the modification stored in first database, replace It is changed to the modified word with markup information, obtains updated first database.

Optionally, storage medium is also configured to store the program code for executing following steps: obtaining to target text This variety of division result divided, wherein every kind of division result is made of the word of target text；In a variety of divisions As a result the target division result for meeting goal rule is determined in；In first database, in the word of lookup target division result Target word.

Optionally, storage medium is also configured to store the program code for executing following steps: tying in a variety of divisions In fruit, the sum of the sum of length of all words in every kind of division result is obtained and determines length of all words maximum the One divides as a result, wherein, the quantity of the first division result is the first quantity；In the case where the first quantity is 1, first is divided As a result it is determined as meeting the target division result of goal rule；In the case where the first quantity is not 1, the first of the first quantity In division result, obtains the average length of all words in every kind of first division result and determine the average length of all words Spend maximum second division result, wherein the quantity of the second division result is the second quantity, and the second quantity is less than or equal to the first number Amount；In the case where the second quantity is 1, the second division result is determined as to meet the target division result of goal rule；? In the case that two quantity are not 1, in the second division result of the second quantity, all words in every kind of second division result are obtained The long amplitude of variation of the word of language and the smallest third division result of the long amplitude of variation of word for determining all words, wherein third divides As a result quantity is third quantity, and third quantity is less than or equal to the second quantity；In the case where third quantity is 1, third is drawn Point result is determined as meeting the target division result of goal rule；In the case where third quantity is not 1, the of third quantity In three division results, the free morpheme degree of all words in every kind of third division result of acquisition and oneself for determining all words By highest 4th division result of morpheme degree, wherein the quantity of the 4th division result is the 4th quantity, and the 4th quantity is less than or equal to Third quantity, free morpheme degree is used to indicate word and morpheme constitutes the probability of new word；The case where the 4th quantity is 1 Under, the 4th division result is determined as to meet the target division result of goal rule.

Optionally, storage medium is also configured to store the program code for executing following steps: according to participle Target is semantic, and after the semanteme for determining target text, the semantic data for being used to indicate target text are reported to server, In, the semantic data for being used to indicate target text are used to update the second database on server, and the second database is for depositing Store up the word in multiple fields；Pass through the second database update first database.

Optionally, storage medium is also configured to store the program code for executing following steps: acquisition is set to vehicle-mounted The target text that the target voice that standby or voice-input device receives is identified.

Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or The various media that can store program code such as CD.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention State all or part of the steps of method.

In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed client, it can be by others side Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of method for recognizing semantics characterized by comprising

Obtain the target text identified to target voice；

In first database, the target word in the word of the target text is searched, wherein the first database is used for The word with markup information is stored, the markup information is used to indicate field belonging to the word with the markup information；

In the case where finding the target word in the first database, will there is target mark in the first database The target word for infusing information, is determined as the participle of the target text, wherein the markup information includes the target mark Information is infused, the target markup information is used to indicate field belonging to the target word；

Determine that the target of the participle is semantic according to the target markup information；

It is semantic according to the target of the participle, determine the semanteme of the target text.

2. the method according to claim 1, wherein searching the target text in the first database Word in target word include:

At least one word is selected from the word of the target text, and at least one described word is determined as the target Word；

The target word is searched in the first database.

3. the method according to claim 1, wherein searching the target text in the first database Word in the target word before, the method also includes:

From the word with the markup information obtained in server in multiple fields, wherein with the markup information Word marks the markup information on the server；

It will be imported in the first database with the word of the markup information.

4. the method according to claim 1, wherein searching the target text in the first database Word in target word include:

In the first database with dictionary tree construction, the target word in the word of the target text is searched, In, the word with the markup information in multiple fields is distributed in the mulitpath of the dictionary tree construction.

5. the method according to claim 1, wherein

In the first database, before searching the target word in the word of the target text, the method is also Include: from server obtain multiple fields in the updated word with the markup information；To the first database The middle updated word with the markup information of addition, obtains the updated first database；

In the first database, the target word searched in the word of the target text includes: in the updated In the first database, the target word in the word of the target text is searched.

6. according to the method described in claim 5, it is characterized in that,

It include: from described from the updated word with the markup information in the multiple field is obtained in the server The word with the markup information increased newly in the multiple field is obtained in server, wherein in the first database The not stored newly-increased word with the markup information, the updated word with the markup information includes newly-increased tool There is the word of the markup information；

The updated word with the markup information is added into the first database, obtains updated described first Database includes: that the newly-increased word with the markup information is added into the first database, obtains updated institute State first database.

7. according to the method described in claim 6, it is characterized in that,

It include: from described from the updated word with the markup information in the multiple field is obtained in the server The modified word with the markup information in the multiple field is obtained in server, wherein the first database In be stored with the word with the markup information before modification corresponding with the modified word with the markup information, The updated word with the markup information includes the modified word with the markup information；

The updated word with the markup information is added into the first database, obtains updated described first Database includes: the word with the markup information before the modification that will be stored in the first database, replaces with modification The word with the markup information afterwards, obtains the updated first database.

8. the method according to claim 1, wherein searching the target text in the first database Word in the target word include:

Obtain a variety of division results divided to the target text, wherein every kind of division result is by described The word of target text forms；

The target division result for meeting goal rule is determined in a variety of division results；

In the first database, the target word in the word of the target division result is searched.

9. according to the method described in claim 8, it is characterized in that, determining in a variety of division results meet the target Rule the target division result include:

In a variety of division results, obtains the sum of length of all words in every kind of division result and determine institute State maximum first division result of the sum of length of all words, wherein the quantity of first division result is the first quantity； In the case where first quantity is 1, first division result is determined as to meet the target of the goal rule Division result；

In the case where first quantity is not 1, in first division result of first quantity, every kind of institute is obtained It states the average length of all words in the first division result and determines the average length of all words maximum the Two division results, wherein the quantity of second division result is the second quantity, and second quantity is less than or equal to described first Quantity；In the case where second quantity is 1, second division result is determined as meeting described in the goal rule Target division result；

In the case where second quantity is not 1, in second division result of second quantity, every kind of institute is obtained It states the long amplitude of variation of word of all words in the second division result and determines the long amplitude of variation of word of all words The smallest third division result, wherein the quantity of the third division result is third quantity, and the third quantity is less than or equal to Second quantity；In the case where the third quantity is 1, the third division result is determined as to meet the target rule The target division result then；

In the case where the third quantity is not 1, in the third division result of the third quantity, every kind of institute is obtained It states the free morpheme degree of all words in third division result and determines the free morpheme degree highest of all words The 4th division result, wherein the quantity of the 4th division result is the 4th quantity, and the 4th quantity is less than or equal to described Third quantity, the free morpheme degree is used to indicate the word and morpheme constitutes the probability of new word；In the 4th number In the case that amount is 1, the 4th division result is determined as to meet the target division result of the goal rule.

10. method as claimed in any of claims 1 to 9, which is characterized in that in the mesh according to the participle Poster is adopted, after the semanteme for determining the target text, the method also includes:

The semantic data for being used to indicate the target text are reported to server, wherein be used to indicate the target text Semantic data are used to update the second database on the server, and second database is for storing in multiple fields Word；

Pass through first database described in second database update.

11. method as claimed in any of claims 1 to 9, which is characterized in that obtain and carried out to the target voice Identify that the obtained target text includes:

Obtain the target text identified to the target voice that mobile unit or voice-input device receive.

12. a kind of semantic recognition device characterized by comprising

Acquiring unit, for obtaining the target text identified to target voice；

Searching unit, for searching the target word in the word of the target text, wherein described in first database First database is used to store the word with markup information, and the markup information is used to indicate the word with the markup information Field belonging to language；

First determination unit, in the case where for finding the target word in the first database, by described first With the target word of target markup information in database, it is determined as the participle of the target text, wherein the mark Information includes the target markup information, and the target markup information is used to indicate field belonging to the target word；

Second determination unit, the target for determining the participle according to the target markup information are semantic；

Third determination unit, it is semantic for the target according to the participle, determine the semanteme of the target text.

13. device according to claim 12, which is characterized in that searching unit includes:

Determining module, for selecting at least one word from the word of the target text, and will at least one described word It is determined as the target word；

Searching module, for searching the target word in the first database.

14. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein when described program is run Execute method for recognizing semantics described in any one of claim 1 to 11.

15. a kind of electronic device, including memory, processor and it is stored on the memory and can transports on the processor Capable computer program, which is characterized in that the processor executes the claim 1 to 11 times by the computer program Method for recognizing semantics described in one.