CN105786793B - Parse the semantic method and apparatus of spoken language text information - Google Patents
Parse the semantic method and apparatus of spoken language text information Download PDFInfo
- Publication number
- CN105786793B CN105786793B CN201510977813.0A CN201510977813A CN105786793B CN 105786793 B CN105786793 B CN 105786793B CN 201510977813 A CN201510977813 A CN 201510977813A CN 105786793 B CN105786793 B CN 105786793B
- Authority
- CN
- China
- Prior art keywords
- feature
- field
- text information
- regular expression
- weighted value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses the semantic method and apparatus of parsing spoken language text information.One specific embodiment of the method includes: to be segmented received spoken language text information to extract feature;The association field of spoken language text information is determined by the noun in the feature extracted;In response to being associated with the default feature in field in the preset database of characteristic matching of extraction, weighted value by default feature in association field is determined as the feature extracted in the weighted value in association field, wherein, preset database can include but is not limited to default feature in the weighted value of multiple fields, and multiple fields can include but is not limited to association field;Weighted value based on the feature of extraction in association field determines text information in the score value of the regular expression in association field;Score value is ranked up, the regular expression of preset quantity is obtained according to the result of sequence;Parsing text of the regular expression that will acquire as spoken language text information.This embodiment improves the accuracys for obtaining semantic parsing result.
Description
Technical field
This application involves field of computer technology, and in particular to technical field of voice recognition, more particularly to the spoken text of parsing
The semantic method and apparatus of this information.
Background technique
Spoken semantic parsing is the information of understanding spoken language voice signal carrying, is carried out in the voice signal inputted for user
After spoken semantic parsing, it can be retrieved according to the parsing text of spoken language text information, so that the speed of retrieval information is improved,
Improve the updating ability of information.
It is spoken language text information that currently used spoken semantic analytic method, which is by spoken voice signal identification, is adopted later
Spoken language text information is parsed with the method for rule match, obtains the parsing text of spoken language text information.
However, the semantic analytic method of current spoken language, in the method using rule match to same spoken language text information
When being parsed to obtain the parsing text of spoken language text information, tend to obtain a plurality of parsing text, and not can determine that
Which item more approaches user's intention to be expressed in a plurality of parsing text.
Summary of the invention
The purpose of the application is to propose a kind of semantic method and apparatus of improved parsing spoken language text information, to solve
The technical issues of certainly background section above is mentioned.
In a first aspect, this application provides a kind of semantic methods of parsing spoken language text information, which comprises right
Received spoken language text information is segmented to extract feature;The spoken language text information is determined by the noun in the feature extracted
Association field;The default feature in field is associated with described in the preset database of characteristic matching in response to the extraction, by institute
Weighted value of the feature in the association field that weighted value of the default feature in the association field is determined as the extraction is stated,
In, the preset database includes weighted value of the default feature in multiple fields, and the multiple field includes the association neck
Domain;Feature based on the extraction determines the text information in the association field in the weighted value in the association field
The score value of regular expression;The score value is ranked up, the regular expression of preset quantity is obtained according to the result of sequence;It will
Parsing text of the regular expression of acquisition as the spoken language text information.
In some embodiments, the default feature is determined in the weighted value of multiple fields by following processing: multiple
The number for default feature occur in each field in field presets total word of the text information sample of feature divided by appearance
Number obtains the frequency that default feature occurs in each field;By the quantity for the text information sample for the default feature occur
Divided by the quantity of total text information sample, the reverse document-frequency of the default feature is obtained, wherein described described preset occur
The text information sample of feature and total text information sample are by having parsed the historical data of semantic spoken language text information
It obtains;The frequency that the default feature is occurred in each field is obtained multiplied by the reverse document-frequency of the default feature
Feature is preset in the weighted value in each field, and the weighted value according to the default feature in each field, obtains institute
Default feature is stated in the weighted value of multiple fields.
In some embodiments, the feature based on the extraction the association field weighted value, determine described in
Text information includes: in the association field, by the spy of the extraction in the score value of the regular expression in the association field
The weighted value that the feature of regular expression is hit in sign is added, and obtains the text information in the regular expressions in the association field
The score value of formula.
In some embodiments, association field described in the preset database of characteristic matching in response to the extraction is pre-
If feature, the weighted value by the default feature in the association field is determined as the feature of the extraction in the association field
Weighted value include: the default filtering vocabulary of hit in the feature for filter out the extraction feature, obtain filtered feature;Response
The default feature in field is associated with described in the filtered preset database of characteristic matching, by the default feature in institute
The weighted value for stating association field is determined as the filtered feature in the weighted value in the association field;And it is described described
In association field, the weighted value that the feature of regular expression is hit in the feature of the extraction is added, the text envelope is obtained
Cease the regular expression in the association field score value include: in the association field, will be in the filtered feature
The weighted value for hitting the feature of regular expression is added, and obtains the score value of the regular expression of the text information.
In some embodiments, the feature based on the extraction the association field weighted value, determine described in
Score value of the text information in the regular expression in the association field further include: obtain text information described by following steps
The regular expression in association field: the type label of entity information is identified from the feature of the extraction;In response to the class of identification
It is associated with the preset type label that the regular expression in field has described in type tag match initialized data base, will have default
Type label regular expression of the regular expression as the text information in the association field, wherein it is described pre-
The database set includes the regular expression with preset kind label in the multiple field.
In some embodiments, the type label that entity information is identified from the feature of the extraction includes: from institute
State the positional relationship between the verb, noun and verb and noun that identify entity information in the feature of extraction;And the sound
The preset type mark that should have in the regular expression that the type label of identification matches association field described in initialized data base
Label, the regular expression using the regular expression with preset type label as the text information in the association field
It include: pass described in the positional relationship matching initialized data base between the verb, noun and verb and noun in response to identification
The positional relationship between preset verb, noun and verb and noun that the regular expression in connection field has, will have pre-
If verb, positional relationship between noun and verb and noun regular expression as the text information in the pass
The regular expression in connection field.
Second aspect, this application provides a kind of semantic devices of parsing spoken language text information, and described device includes: spy
Extraction module is levied, for being segmented received spoken language text information to extract feature;Field determining module, for by extracting
Feature in noun determine the association field of the spoken language text information;Weight determination module, in response to the extraction
The preset database of characteristic matching described in be associated with field default feature, by the default feature in the association field
Weighted value is determined as weighted value of the feature in the association field of the extraction, wherein the preset database includes pre-
If feature, in the weighted value of multiple fields, the multiple field includes the association field;Score value determining module, for being based on institute
Weighted value of the feature in the association field for stating extraction, determines the text information in the regular expression in the association field
Score value;Expression formula obtains module, and for being ranked up to the score value, the canonical of preset quantity is obtained according to the result of sequence
Expression formula;Parse text module, parsing text of the regular expression as the spoken language text information for will acquire.
In some embodiments, the default feature in the weight determination module passes through in the weighted value of multiple fields
Determined with lower module: the frequency of occurrences obtains module, time for there is default feature in each field of multiple fields
Number obtains the frequency that default feature occurs in each field divided by the total word number for the text information sample for default feature occur
Rate;Reverse document-frequency obtains module, for that the quantity of the text information sample of the default feature will occur divided by total text
The quantity of message sample obtains the reverse document-frequency of the default feature, wherein the text for the default feature occur
Message sample and total text information sample are obtained by the historical data for having parsed semantic spoken language text information;Weighted value
Obtain module, the frequency for occurring the default feature in each field multiplied by the default feature reverse file frequently
Rate, obtains default feature in the weighted value in each field, and the weighted value according to the default feature in each field,
The default feature is obtained in the weighted value of multiple fields.
In some embodiments, the score value determining module includes: addition submodule, is used in the association field,
The weighted value for hitting the feature of regular expression in the feature of the extraction is added, obtains the text information in the association
The score value of the regular expression in field.
In some embodiments, the weight determination module includes: that feature filters out submodule, for filtering out the extraction
The feature that default filtering vocabulary is hit in feature, obtains filtered feature;Weight determines submodule, in response to the mistake
It is associated with the default feature in field described in the preset database of characteristic matching after filter, the default feature is led in the association
The weighted value in domain is determined as the filtered feature in the weighted value in the association field;And the addition submodule packet
It includes: in the association field, the weighted value that the feature of regular expression is hit in the filtered feature being added, is obtained
The score value of the regular expression of the text information.
In some embodiments, the score value determining module further include: expression formula determining module, comprising: type label is known
Other module, for identifying the type label of entity information from the feature of the extraction;Expression formula matching module, in response to
The preset type label that the regular expression in association field described in the type label matching initialized data base of identification has, will
Regular expression of the regular expression with preset type label as the text information in the association field,
In, the preset database includes the regular expression with preset kind label in the multiple field.
In some embodiments, the type label identification module is further used for: identifying from the feature of the extraction
Verb, noun in entity information and the positional relationship between verb and noun;And the expression formula matching module is into one
Step is used for: in response to described in the positional relationship matching initialized data base between the verb, noun and verb and noun of identification
The positional relationship between preset verb, noun and verb and noun that the regular expression in association field has, will have
The regular expression of positional relationship between preset verb, noun and verb and noun is as the text information described
The regular expression in association field.
The semantic method and apparatus of parsing spoken language text information provided by the present application, by believing received spoken language text
Breath is segmented to extract feature, determines the association field of spoken language text information by the noun in the feature extracted later, later
Power in response to being associated with the default feature in field in the preset database of characteristic matching of extraction, by default feature in association field
Weight values are determined as weighted value of the feature in association field of extraction, later the weighted value based on the feature of extraction in association field,
Determine that text information in the score value of the regular expression in association field, is then ranked up score value, the result based on sequence obtains
The regular expression for taking preset quantity, parsing text of the regular expression that finally will acquire as spoken language text information.At this
In method, the weighted value of the feature of extraction in association field represents importance of the feature in association field of extraction, and according to
Importance of the feature of extraction in association field obtains the parsing text of spoken language text information, improves the semantic parsing result of acquisition
Accuracy.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the schematic flow according to one embodiment of the semantic method of the parsing spoken language text information of the application
Figure;
Fig. 3 is the exemplary structure according to one embodiment of the semantic device of the parsing spoken language text information of the application
Figure;
Fig. 4 is adapted for the structural representation of the computer system for the terminal device or server of realizing the embodiment of the present application
Figure.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the semantic method or parsing spoken language text letter of the parsing spoken language text information of the application
The exemplary system architecture 100 of the embodiment of the semantic device of breath.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..The client application of various support spoken voice identifications can be installed on terminal device 101,102,103, such as
Web browser applications, shopping class application, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be with display screen and the various electronics of spoken voice identification supported to set
It is standby, including but not limited to smart phone, tablet computer, E-book reader, MP3 (Moving Picture Experts
Group Audio Layer III, dynamic image expert's compression standard audio level 3) player, MP4 (Moving Picture
Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) it is player, on knee portable
Computer and desktop computer etc..
Server 105 can be to provide the server of various services, such as to various branch on terminal device 101,102,103
The client application for holding spoken voice identification provides the background server supported.Background server can be to the spoken language received
The data such as sound signal carry out the processing such as analyzing, and processing result (such as spoken semantic parsing result) is fed back to terminal and is set
It is standby.
It should be noted that the semantic method of parsing spoken language text information is generally by taking provided by the embodiment of the present application
Business device 105 executes, and correspondingly, the semantic device of parsing spoken language text information is generally positioned in server 105.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, Fig. 2 shows according to one of the semantic method of the parsing spoken language text information of the application
The process 200 of embodiment.The semantic method of the parsing spoken language text information, comprising the following steps:
Step 201, received spoken language text information is segmented to extract feature.
In the present embodiment, there is enough data processing energy if receiving the electronic equipment of user's spoken voice signal itself
Power, the then semantic method for parsing spoken language text information can directly run in electronic equipment that (such as terminal shown in FIG. 1 is set
Standby or server);If the electronic equipment (such as terminal device shown in FIG. 1) itself for receiving user's spoken voice signal does not have
Received spoken voice signal can be then transmitted to the electronic equipment with higher position reason ability by enough data-handling capacities
(such as server shown in FIG. 1), with higher position reason ability electronic equipment in by spoken voice signal identification be spoken language
Text information, and the further semantic method of operation parsing spoken language text information.Above-mentioned spoken language text information passes through identification
Spoken voice signal obtains.The method for identifying spoken voice signal can be in the prior art or the technology of future development for knowing
The method of other spoken voice signal, the application do not limit this.Above-mentioned radio connection can include but is not limited to 3G/4G
Connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection and other
Currently known or exploitation in the future radio connection.
In the present embodiment, participle is carried out to received spoken language text information and refers to spoken language text information cutting to be multiple
Individual word.It can be currently known or exploitation in the future participle side to the method that received spoken language text information is segmented
Method, the application do not limit this.For example, existing segmentation methods can be divided into three categories: the participle side based on string matching
Method, the segmenting method based on understanding and the segmenting method based on statistics.It is combined according to whether with part-of-speech tagging process, and can be with
It is divided into the integral method that simple segmenting method and participle are combined with mark.It is with the segmenting method based on string matching
Example can match spoken language text information and the entry in a huge machine dictionary of capacity according to certain strategy, if
Some character string is found in dictionary, then successful match, obtain multiple individual words.
After segmenting to received spoken language text information, feature can be extracted to obtained multiple individual words,
The feature that feature extraction is extracted is carried out to multiple individual words.Here feature refers to for indicating the substantially single of text
Position, feature are generally configured with following characteristic: feature can identify content of text really, feature has target text and other texts
The ability mutually distinguished, the number of feature cannot too many, character separation be easier to realize.In Chinese text, word, word can be used
Or phrase is as the feature for indicating text.In comparison, word has stronger ability to express than word, and word is compared with phrase,
The cutting difficulty of word is more much smaller than the cutting difficulty of phrase.Therefore, most of Chinese Text Classification Systems all use word to make at present
It is characterized, referred to as Feature Words.Intermediate representation of these Feature Words as document, for realize document and document, document with
Similarity calculation between ownership goal.If the dimension of feature vector will be excessively huge using all words all as feature
Greatly, too big so as to cause calculation amount, it is therefore desirable to carry out feature extraction, the major function of feature extraction is not damage text core
It reduces word number to be processed in the case where heart information to the greatest extent, dimension of a vector space is reduced with this, is calculated to simplify, is improved
The speed and efficiency of text-processing.It should be appreciated that when extracting feature, can using the method for extraction feature in the prior art or
The method that feature is extracted in WeiLai Technology carries out feature extraction, and the application does not limit this.By taking the prior art as an example, carry out special
The mode that sign is extracted includes at least following several: primitive character being transformed to less new feature with the method for mapping or transformation;
Some most representative features are picked out from primitive character;Most influential feature is selected according to the knowledge of expert;And
It is chosen with the method for mathematics, finds out the feature of most classification information.It, can be according to some spy in a specific application
Sign valuation functions calculate the score value of each feature, are then ranked up by score value to these features, choose several scorings
It is worth highest feature as the feature extracted.
As an example, can be segmented to short sentence " I will go to Baidu mansion " to extract feature, can be obtained by participle
To " I ", " wanting ", " going ", " Baidu mansion " word segmentation result, to obtain " I ", " wanting ", " going ", " Baidu mansion " four spies
Sign.
Step 202, the association field of spoken language text information is determined by the noun in the feature extracted.
In the present embodiment, based on the noun in the feature extracted in step 201, the pass of spoken language text information can be determined
Connection field.For example, can determine that the association field of " I will go to Baidu mansion " is map field based on above-mentioned " Baidu mansion ".
In the present embodiment, the association field of spoken language text information can include but is not limited to following one or more necks
Domain: music field, map field, address list field, TV programme field, movie news field and television command field etc..
Step 203, in response to the default feature in association field in the preset database of the characteristic matching of extraction, by default spy
It levies the weighted value in association field and is determined as the feature extracted in the weighted value in association field.
In the present embodiment, above-mentioned preset database can include but is not limited to default feature in the weight of multiple fields
Value, wherein multiple fields can include but is not limited to association field.
The electronic equipment of the semantic method of operation parsing spoken language text information, in the characteristic matching extracted and presets
Database in default feature matching when, can determine the association field in preset database first, with reduce matching
Range, and then improve and carry out matched efficiency, field will be associated in the feature of said extracted and preset database later
Default feature is matched one by one, will if the characteristic matching extracted is associated with the default feature in field into preset database
Weighted value of the default feature in association field is determined as the feature extracted in the weighted value in association field.
Feature is preset in some optional implementations of the present embodiment, in above-mentioned preset database in multiple fields
Weighted value determined by following processing: the number for default feature occur in each field of multiple fields is divided by appearance
Total word number of the text information sample of default feature obtains the frequency that default feature occurs in each field;To occur pre-
If the quantity of the text information sample of feature divided by the quantity of total text information sample, obtains the reverse file frequency of default feature
Rate, wherein the text information sample of default feature occur and total text information sample is believed by having parsed semantic spoken language text
The historical data of breath obtains;The frequency that default feature is occurred in each field multiplied by default feature reverse document-frequency,
Default feature is obtained to obtain in the weighted value in each field, and according to the default feature in the weighted value in each field
To the default feature multiple fields weighted value.
In this implementation, determine default feature in multiple necks by calculating the reverse document-frequency TF-IDF of word frequency-
When the weighted value in domain, TF indicates the frequency that default feature occurs in the text information sample of every field, can be by will be pre-
If the number that feature occurs is obtained divided by the total word number for the text information sample for default feature occur, feature is preset in a document
The number of appearance is more, then the TF value for presetting feature is bigger;IDF indicates reverse document-frequency, by that default feature will occur
The quantity of text information sample is obtained divided by the quantity of total text information sample, it is meant that in multiple fields, if there is pre-
If the quantity of the text information sample of feature is fewer, then the IDF value of this feature is bigger;The product of TF and IDF is default feature
Weighted value, that is, default feature is in the weighted value of multiple fields.For example, for default feature " match ", in short sentence
Weight in " me please be helped to inquire the schedules of Warriors' tomorrow match " is greater than in short sentence " me is reminded to watch the match tomorrow "
Weight, that is to say, that weight of the feature " match " in competitive sports field be greater than remind field weight.
In some optional implementations of the present embodiment, in response to being associated in the preset database of characteristic matching of extraction
The default feature in field, the weighted value by default feature in association field are determined as the feature extracted in the weighted value in association field
It can include but is not limited to: filtering out the feature for hitting default filtering vocabulary in the feature of extraction, obtain filtered feature;Response
The default feature in field, the weight by default feature in association field are associated in the preset database of filtered characteristic matching
Value is determined as filtered feature in the weighted value in association field.
Step 204, the weighted value based on the feature of extraction in association field determines text information in the canonical in association field
The score value of expression formula.
In the present embodiment, the default of field is associated in the above-mentioned preset database of the characteristic matching in response to extraction
Feature, the weighted value by default feature in association field are determined as the feature extracted after the weighted value in association field, can be with
By the feature of extraction based on the weighted value in the field of association, determine text information in point of the regular expression in association field
Value.
In some optional implementations of the present embodiment, based on the feature of extraction association field weighted value, determine
Text information can include but is not limited in the score value of the regular expression in association field: in association field, by the spy of extraction
The weighted value that the feature of regular expression is hit in sign is added, and obtains text information in point of the regular expression in association field
Value.
In this implementation, the score value of this rule of regular expression is the weighted value phase for hitting its feature of extraction
In addition and, it may be assumed that
Wherein, WeightRuleIndicate the weighted value of this rule of regular expression, WeightFeature iIndicate ith feature
Weighted value, the value range of i are from 1 to n, and n indicates that the feature for hitting the extraction of the regular expression is n.
By taking weather field as an example, the weighted value of the different characteristic in weather field is approximately as shown in table:
The regular expression in weather field is expressed as follows:
First regular expression: (weather) (how | good or not | how)?
Second regular expression: (temperature) (how many)? (degree)?
So, for short sentence " how is weather " can match the first regular expression (weather) (how | good or not |
How)? this is regular, then the score value of the rule are as follows:
0.0328802+0.00745463=0.04033483.
In some optional implementations of the present embodiment, filter word is preset with hitting in the above-mentioned feature for filtering out extraction
The feature of table obtains filtered feature;In response to being associated with the default of field in the preset database of filtered characteristic matching
Feature, by default feature association field weighted value be determined as filtered feature association field weighted value it is corresponding,
It is above-mentioned in association field, the weighted value of the feature that regular expression is hit in the feature of extraction is added, text envelope is obtained
The score value ceased in the regular expression in association field can include but is not limited to:, will be in filtered feature in association field
The weighted value for hitting the feature of regular expression is added, and obtains the score value of the regular expression of text information.
In some optional implementations of the present embodiment, based on the feature of extraction association field weighted value, determine
Text information can also include but is not limited in the score value of the regular expression in association field: obtain text information by following steps
Regular expression in association field: the type label of entity information is identified from the feature of extraction;In response to the type of identification
It is associated with the preset type label that the regular expression in field has in tag match initialized data base, there will be preset type
Regular expression of the regular expression of label as text information in association field, wherein preset database may include
But it is not limited to the regular expression with preset kind label in multiple fields.
Identify that the type label of entity information can include but is not limited in the above-mentioned feature from extraction: from the spy of extraction
The positional relationship between the verb and noun and verb and noun of entity information is identified in sign;And the type in response to identification
It is associated with the preset type label that the regular expression in field has in tag match initialized data base, there will be preset type
Regular expression of the regular expression of label as text information in association field can include but is not limited to: in response to identification
Verb and noun and verb and noun between positional relationship matching initialized data base in be associated with field regular expression
Positional relationship between the preset verb having and noun and verb and noun, will have preset verb and noun and
Regular expression of the regular expression of positional relationship between verb and noun as text information in association field.
The type label that entity information is identified in the above-mentioned feature from extraction, can use knowledge well known in the prior art
Recognition methods in other method or WeiLai Technology realizes that the application does not limit this.For example, condition random field can be used
CRF algorithm identifies the type label of entity information from the feature of extraction.
Step 205, score value is ranked up, the regular expression of preset quantity is obtained according to the result of sequence.
It in the present embodiment, can regular expression for text information determining in step 204 in association field
Score value be ranked up.Wherein, preset quantity can be one or more, can be according to the setting of user or technological development personnel
Come determine acquisition regular expression quantity.For example, preset quantity can be set as three, according to score value from it is high to low into
After row sequence, highest three regular expressions that sort are obtained;Preset quantity also can be set as one, according to score value from height
To low be ranked up, the highest regular expression that sorts only is obtained.
Step 206, parsing text of the regular expression that will acquire as spoken language text information.
In the present embodiment, can will be made in step 205 according to the regular expression that the result of sequence obtains preset quantity
For the parsing text of spoken language text information.For example, using highest three regular expressions of the sequence of above-mentioned acquisition as spoken text
The parsing text of this information or the highest regular expression that will sort are as the parsing text of spoken language text information.
When the quantity of the regular expression of acquisition is multiple, the regular expressions obtained further can be presented to user
Formula parses text for selection by the user, to improve the accuracy of parsing and promote user experience.
The method provided by the above embodiment of the application, by being segmented received spoken language text information to extract spy
Sign is determined the association field of spoken language text information, later in response to the feature of extraction by the noun in the feature extracted later
With the default feature for being associated with field in preset database, the weighted value by default feature in association field is determined as the spy extracted
The weighted value in association field is levied, the weighted value based on the feature of extraction in association field, determines that text information is being associated with later
The score value of the regular expression in field, is then ranked up score value, and the result based on sequence obtains the canonical table of preset quantity
Up to formula, parsing text of the regular expression that finally will acquire as spoken language text information improves the semantic parsing result of acquisition
Accuracy.
With further reference to Fig. 3, as the realization to method shown in above-mentioned each figure, this application provides a kind of spoken texts of parsing
One embodiment of the semantic device of this information, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, the device
It specifically can be applied in various electronic equipments.
As shown in figure 3, the semantic device 300 of parsing spoken language text information described in the present embodiment may include but unlimited
In: characteristic extracting module 310, field determining module 320, weight determination module 330, score value determining module 340, expression formula obtain
Module 350 and parsing text module 360.
Wherein, characteristic extracting module 310 are configured to segment to extract feature received spoken language text information;
Field determining module 320 is configured to determine the association field of spoken language text information by the noun in the feature extracted;Weight is true
Cover half block 330 is configured to be associated with the default feature in field in the preset database of characteristic matching in response to extraction, will preset
Weighted value of the feature in association field is determined as the feature extracted in the weighted value in association field, wherein preset database can
To include but is not limited to preset feature in the weighted value of multiple fields, multiple fields can include but is not limited to association field;Point
It is worth determining module 340, is configured to the feature based on extraction in the weighted value in association field, determines text information in association field
Regular expression score value;Expression formula obtains module 350, is configured to be ranked up score value, be obtained according to the result of sequence
Take the regular expression of preset quantity;Text module 360 is parsed, the regular expression for being configured to will acquire is as spoken language text
The parsing text of information.
In some optional implementations of the present embodiment, the default feature in weight determination module 330 is in multiple fields
Weighted value by determining (not shown) with lower module: the frequency of occurrences obtains module, reverse document-frequency obtain module and
Weighted value obtains module.Wherein, the frequency of occurrences obtains module, is configured to default spy in each field of multiple fields
The number occurred is levied divided by the total word number for the text information sample for default feature occur, obtains default feature in each field
The frequency of appearance;Reverse document-frequency obtains module, is configured to remove the quantity for the text information sample for default feature occur
With the quantity of total text information sample, the reverse document-frequency of default feature is obtained, wherein the text information of default feature occur
Sample and total text information sample are obtained by the historical data for having parsed semantic spoken language text information;Weighted value obtains mould
Block, the frequency for being configured to occur default feature in each field obtain pre- multiplied by the reverse document-frequency of default feature
If feature is in the weighted value in each field, and the weighted value according to the default feature in each field, obtain described
Weighted value of the default feature in multiple fields.
In some optional implementations of the present embodiment, score value determining module 340 be can include but is not limited to (in figure not
Show): it is added submodule, is configured in association field, the power of the feature of regular expression will be hit in the feature of extraction
Weight values are added, and obtain text information in the score value of the regular expression in association field.
In some optional implementations of the present embodiment, weight determination module 330 be can include but is not limited to (in figure not
Show): feature filters out submodule, is configured to hit the feature of default filtering vocabulary in the feature for filtering out extraction, be filtered
Feature afterwards;Weight determines submodule, is configured in response to being associated with field in the preset database of filtered characteristic matching
Default feature, by default feature association field weighted value be determined as filtered feature association field weighted value;
And addition submodule can be further used for: in association field, the spy of regular expression will be hit in filtered feature
The weighted value of sign is added, and obtains the score value of the regular expression of text information.
In some optional implementations of the present embodiment, score value determining module 340 can also include but is not limited to (in figure
It is not shown): expression formula determining module can include but is not limited to: type label identification module is configured to the feature from extraction
The type label of middle identification entity information;Expression formula matching module is configured to match in response to the type label of identification preset
The preset type label that the regular expression in field has is associated in database, by the canonical table with preset type label
Regular expression up to formula as text information in association field, wherein preset database can include but is not limited to more
The regular expression with preset kind label in a field.
In some optional implementations of the present embodiment, type label identification module is further configured to: from extraction
Feature in identify entity information in verb and noun and verb and noun between positional relationship;And expression formula matching
Module is further configured to: preset in response to the positional relationship matching between the verb and noun and verb and noun of identification
It closes the position being associated in database between the regular expression preset verb having and noun and verb and noun in field
System, using the regular expression with the positional relationship between preset verb and noun and verb and noun as text information
Regular expression in association field.
It will be understood by those skilled in the art that the semantic device 300 of above-mentioned parsing spoken language text information further includes
Other known features, such as processor, memory etc..
It should be appreciated that all modules recorded in device 300 are corresponding with each step in the method with reference to Fig. 2 description.
As a result, the operation above with respect to the semantic method description of parsing spoken language text information and feature be equally applicable to device 300 and
Module wherein included, details are not described herein.Corresponding module in device 300 can in terminal device and/or server
Module cooperates to realize the scheme of the embodiment of the present application.
Below with reference to Fig. 4, it illustrates the calculating of the terminal device or server that are suitable for being used to realize the embodiment of the present application
The structural schematic diagram of machine system 400.
As shown in figure 4, computer system 400 includes central processing unit (CPU) 401, it can be read-only according to being stored in
Program in memory (ROM) 402 or be loaded into the program in random access storage device (RAM) 403 from storage section 408 and
Execute various movements appropriate and processing.In RAM 403, also it is stored with system 400 and operates required various programs and data.
CPU401, ROM 402 and RAM 403 is connected with each other by bus 404.Input/output (I/O) interface 405 is also connected to always
Line 404.
I/O interface 405 is connected to lower component: the importation 406 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 407 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 408 including hard disk etc.;
And the communications portion 409 of the network interface card including LAN card, modem etc..Communications portion 409 via such as because
The network of spy's net executes communication process.Driver 410 is also connected to I/O interface 405 as needed.Detachable media 411, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 410, in order to read from thereon
Computer program be mounted into storage section 408 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be tangibly embodied in machine readable
Computer program on medium, the computer program include the program code for method shown in execution flow chart.At this
In the embodiment of sample, which can be downloaded and installed from network by communications portion 409, and/or from removable
Medium 411 is unloaded to be mounted.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong
The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer
The combination of order is realized.
Being described in module involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet
It includes and connects characteristic extracting module, field determining module, weight determination module, score value determining module, expression formula obtains module and parsing
Text module.Wherein, the title of these modules does not constitute the restriction to the module itself under certain conditions, for example, feature
Extraction module is also described as " being segmented received spoken language text information to extract the module of feature ".
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, the non-volatile calculating
Machine storage medium can be nonvolatile computer storage media included in device described in above-described embodiment;It is also possible to
Individualism, without the nonvolatile computer storage media in supplying terminal.Above-mentioned nonvolatile computer storage media is deposited
One or more program is contained, when one or more of programs are executed by an equipment, so that the equipment: docking
The spoken language text information of receipts is segmented to extract feature;The association of spoken language text information is determined by the noun in the feature extracted
Field;In response to being associated with the default feature in field in the preset database of characteristic matching of extraction, default feature is led in association
The weighted value in domain is determined as the feature extracted in the weighted value in association field, wherein preset database may include but unlimited
In presetting feature in the weighted value of multiple fields, multiple fields can include but is not limited to association field;Feature based on extraction
Weighted value in association field determines text information in the score value of the regular expression in association field;Score value is ranked up, root
The regular expression of preset quantity is obtained according to the result of sequence;Solution of the regular expression that will acquire as spoken language text information
Analyse text.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (12)
1. a kind of semantic method of parsing spoken language text information, comprising:
Received spoken language text information is segmented to extract feature;
The association field of the spoken language text information is determined by the noun in the feature extracted;
The default feature in field is associated with described in the preset database of characteristic matching in response to the extraction, by the default spy
Levy weighted value of the feature in the association field that the weighted value in the association field is determined as the extraction, wherein described
Preset database includes default feature in the weighted value of multiple fields, and the multiple field includes the association field;
Feature based on the extraction determines the text information in the association field in the weighted value in the association field
The score value of regular expression;
The score value is ranked up, the regular expression of preset quantity is obtained according to the result of sequence;
Parsing text of the regular expression that will acquire as the spoken language text information.
2. the method according to claim 1, wherein the default feature multiple fields weighted value by with
Lower processing determines:
The number for default feature occur in each field of multiple fields is divided by the text information sample for default feature occur
This total word number obtains the frequency that default feature occurs in each field;
By the quantity for the text information sample for the default feature occur divided by the quantity of total text information sample, obtain described pre-
If the reverse document-frequency of feature, wherein the text information sample for the default feature occur and total text envelope
Breath sample is obtained by the historical data for having parsed semantic spoken language text information;
The frequency that the default feature is occurred in each field obtains pre- multiplied by the reverse document-frequency of the default feature
If feature is in the weighted value in each field, and the weighted value according to the default feature in each field, obtain described
Weighted value of the default feature in multiple fields.
3. method according to any one of claims 1 or 2, which is characterized in that the feature based on the extraction exists
The weighted value in the association field determines that the text information includes: in the score value of the regular expression in the association field
In the association field, the weighted value that the feature of regular expression is hit in the feature of the extraction is added, is obtained
Score value of the text information in the regular expression in the association field.
4. according to the method described in claim 3, it is characterized in that, the preset number of the characteristic matching in response to the extraction
According to the default feature for being associated with field described in library, the weighted value by the default feature in the association field is determined as described mention
The feature taken is in the spy that the weighted value in the association field includes: the default filtering vocabulary of hit in the feature for filter out the extraction
Sign, obtains filtered feature;In response to the pre- of association field described in the filtered preset database of characteristic matching
If feature, the weighted value by the default feature in the association field is determined as the filtered feature and leads in the association
The weighted value in domain;And
It is described to be added the weighted value that the feature of regular expression is hit in the feature of the extraction in the association field,
Obtain the text information the score value of the regular expression in the association field include: in the association field, will be described
The weighted value that the feature of regular expression is hit in filtered feature is added, and obtains the regular expression of the text information
Score value.
5. according to the method described in claim 4, it is characterized in that, the feature based on the extraction is in the association field
Weighted value, determine the text information in the score value of the regular expression in the association field further include:
Text information is obtained in the regular expression in the association field by following steps:
The type label of entity information is identified from the feature of the extraction;
The regular expression in association field described in the type label matching initialized data base in response to identification has preset
Type label, the canonical using the regular expression with preset type label as the text information in the association field
Expression formula, wherein the initialized data base includes the regular expression with preset kind label in the multiple field.
6. according to the method described in claim 5, it is characterized in that, described identify entity information from the feature of the extraction
Type label includes: from the position between the verb, noun and verb and noun for identifying entity information in the feature of the extraction
Set relationship;And
The regular expression in association field described in the type label matching initialized data base in response to identification has pre-
If type label, using the regular expression with preset type label as the text information in the association field
Regular expression includes: the positional relationship matching preset data between verb, noun and verb and noun in response to identification
It closes the position between preset verb, noun and verb and noun that the regular expression of association field described in library has
System, using the regular expression with the positional relationship between preset verb, noun and verb and noun as the text
Regular expression of the information in the association field.
7. a kind of semantic device of parsing spoken language text information, comprising:
Characteristic extracting module, for being segmented received spoken language text information to extract feature;
Field determining module determines the association field of the spoken language text information for the noun in the feature by extracting;
Weight determination module, for the default of association field described in the preset database of characteristic matching in response to the extraction
Feature, the weighted value by the default feature in the association field are determined as the feature of the extraction in the association field
Weighted value, wherein the preset database includes default feature in the weighted value of multiple fields, and the multiple field includes institute
State association field;
Score value determining module, the weighted value for the feature based on the extraction in the association field, determines the text envelope
Cease the score value in the regular expression in the association field;
Expression formula obtains module, and for being ranked up to the score value, the canonical table of preset quantity is obtained according to the result of sequence
Up to formula;
Parse text module, parsing text of the regular expression as the spoken language text information for will acquire.
8. device according to claim 7, which is characterized in that the default feature in the weight determination module is more
The weighted value in a field with lower module by being determined:
The frequency of occurrences obtains module, and the number for there is default feature in each field of multiple fields is divided by appearance
Total word number of the text information sample of default feature obtains the frequency that default feature occurs in each field;
Reverse document-frequency obtains module, for that the quantity of the text information sample of the default feature will occur divided by total text
The quantity of message sample obtains the reverse document-frequency of the default feature, wherein the text for the default feature occur
Message sample and total text information sample are obtained by the historical data for having parsed semantic spoken language text information;
Weighted value obtains module, and the frequency for occurring the default feature in each field is multiplied by the default feature
Reverse document-frequency obtains default feature in the weighted value in each field, and according to the default feature in each neck
The weighted value in domain obtains the default feature in the weighted value of multiple fields.
9. according to device described in claim 7 or 8 any one, which is characterized in that the score value determining module includes:
It is added submodule, for the feature of regular expression will to be hit in the feature of the extraction in the association field
Weighted value is added, and obtains the text information in the score value of the regular expression in the association field.
10. device according to claim 9, which is characterized in that the weight determination module includes: that feature filters out submodule
Block, the feature of the default filtering vocabulary of hit, obtains filtered feature in the feature for filtering out the extraction;Weight determines son
Module, for the default feature in response to being associated with field described in the filtered preset database of characteristic matching, by institute
It states weighted value of the default feature in the association field and is determined as the filtered feature in the weighted value in the association field;
And
The addition submodule includes: that regular expression will be hit in the filtered feature in the association field
The weighted value of feature is added, and obtains the score value of the regular expression of the text information.
11. device according to claim 10, which is characterized in that the score value determining module further include:
Expression formula determining module, comprising:
Type label identification module, for identifying the type label of entity information from the feature of the extraction;
Expression formula matching module, for being associated with the canonical in field described in the type label matching initialized data base in response to identification
The preset type label that expression formula has, exists the regular expression with preset type label as the text information
The regular expression in the association field, wherein the initialized data base includes having preset kind in the multiple field
The regular expression of label.
12. device according to claim 11, which is characterized in that the type label identification module is further used for: from
Verb, noun in the feature of the extraction in identification entity information and the positional relationship between verb and noun;And
The expression formula matching module is further used for: in response to the position between the verb, noun and verb and noun of identification
Set the regular expression preset verb, noun and the verb that have that field is associated with described in relationship match initialized data base and
Positional relationship between noun, by the canonical table with the positional relationship between preset verb, noun and verb and noun
Regular expression up to formula as the text information in the association field.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510977813.0A CN105786793B (en) | 2015-12-23 | 2015-12-23 | Parse the semantic method and apparatus of spoken language text information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510977813.0A CN105786793B (en) | 2015-12-23 | 2015-12-23 | Parse the semantic method and apparatus of spoken language text information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105786793A CN105786793A (en) | 2016-07-20 |
CN105786793B true CN105786793B (en) | 2019-05-28 |
Family
ID=56390284
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510977813.0A Active CN105786793B (en) | 2015-12-23 | 2015-12-23 | Parse the semantic method and apparatus of spoken language text information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105786793B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547742B (en) * | 2016-11-30 | 2019-05-03 | 百度在线网络技术(北京)有限公司 | Semantic parsing result treating method and apparatus based on artificial intelligence |
CN106649278B (en) * | 2016-12-30 | 2019-11-15 | 三星电子(中国)研发中心 | Extend the method and system of spoken dialogue system corpus |
CN109388796B (en) * | 2017-08-11 | 2023-04-18 | 北京国双科技有限公司 | Method and device for pushing referee document |
CN107705784B (en) * | 2017-09-28 | 2020-09-29 | 百度在线网络技术(北京)有限公司 | Text regularization model training method and device, and text regularization method and device |
CN108197105B (en) * | 2017-12-28 | 2021-08-24 | Oppo广东移动通信有限公司 | Natural language processing method, device, storage medium and electronic equipment |
CN109388700A (en) * | 2018-10-26 | 2019-02-26 | 广东小天才科技有限公司 | A kind of intension recognizing method and system |
CN109800423A (en) * | 2018-12-21 | 2019-05-24 | 广州供电局有限公司 | Method and apparatus are determined based on the power-off event of power failure plan sentence |
CN111401057B (en) * | 2018-12-29 | 2023-11-14 | 深圳Tcl新技术有限公司 | Semantic analysis method, storage medium and terminal equipment |
CN109783821B (en) * | 2019-01-18 | 2023-06-27 | 广东小天才科技有限公司 | Method and system for searching video of specific content |
CN109766555B (en) * | 2019-01-18 | 2023-06-27 | 广东小天才科技有限公司 | Method and system for acquiring semantic slots of user sentences |
CN109800430B (en) * | 2019-01-18 | 2023-06-27 | 广东小天才科技有限公司 | Semantic understanding method and system |
CN112151019A (en) * | 2019-06-26 | 2020-12-29 | 阿里巴巴集团控股有限公司 | Text processing method and device and computing equipment |
CN111680136B (en) * | 2020-04-28 | 2023-08-25 | 平安科技(深圳)有限公司 | Method and device for semantic matching of spoken language |
CN113064981A (en) * | 2021-03-26 | 2021-07-02 | 北京达佳互联信息技术有限公司 | Group head portrait generation method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425635A (en) * | 2012-05-15 | 2013-12-04 | 北京百度网讯科技有限公司 | Method and device for recommending answers |
CN105095186A (en) * | 2015-07-28 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Semantic analysis method and device |
CN105138575A (en) * | 2015-07-29 | 2015-12-09 | 百度在线网络技术(北京)有限公司 | Analysis method and device of voice text string |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130068612A (en) * | 2011-12-15 | 2013-06-26 | 한국전자통신연구원 | Apparatus and method for normalizing text |
-
2015
- 2015-12-23 CN CN201510977813.0A patent/CN105786793B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425635A (en) * | 2012-05-15 | 2013-12-04 | 北京百度网讯科技有限公司 | Method and device for recommending answers |
CN105095186A (en) * | 2015-07-28 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Semantic analysis method and device |
CN105138575A (en) * | 2015-07-29 | 2015-12-09 | 百度在线网络技术(北京)有限公司 | Analysis method and device of voice text string |
Also Published As
Publication number | Publication date |
---|---|
CN105786793A (en) | 2016-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105786793B (en) | Parse the semantic method and apparatus of spoken language text information | |
CN105654950B (en) | Adaptive voice feedback method and device | |
CN108241667B (en) | Method and apparatus for pushed information | |
CN107766371B (en) | Text information classification method and device | |
CN109117777A (en) | The method and apparatus for generating information | |
CN110209812B (en) | Text classification method and device | |
WO2017024553A1 (en) | Information emotion analysis method and system | |
CN110349564A (en) | Across the language voice recognition methods of one kind and device | |
CN112395420A (en) | Video content retrieval method and device, computer equipment and storage medium | |
US20180329985A1 (en) | Method and Apparatus for Compressing Topic Model | |
CN114549874A (en) | Training method of multi-target image-text matching model, image-text retrieval method and device | |
US10915756B2 (en) | Method and apparatus for determining (raw) video materials for news | |
CN111160007B (en) | Search method and device based on BERT language model, computer equipment and storage medium | |
CN115982376B (en) | Method and device for training model based on text, multimode data and knowledge | |
CN107943895A (en) | Information-pushing method and device | |
US20220121668A1 (en) | Method for recommending document, electronic device and storage medium | |
CN109284367A (en) | Method and apparatus for handling text | |
CN110457694A (en) | Message prompt method and device, scene type identification based reminding method and device | |
CN111861596A (en) | Text classification method and device | |
CN109190123A (en) | Method and apparatus for output information | |
CN106815224A (en) | Service acquisition method and apparatus | |
CN114298007A (en) | Text similarity determination method, device, equipment and medium | |
CN110245357A (en) | Principal recognition methods and device | |
CN109213916A (en) | Method and apparatus for generating information | |
CN109670111A (en) | Method and apparatus for pushed information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |