CN109299439A - Digital extraction method and apparatus, storage medium and electronic device - Google Patents
Digital extraction method and apparatus, storage medium and electronic device Download PDFInfo
- Publication number
- CN109299439A CN109299439A CN201810961840.2A CN201810961840A CN109299439A CN 109299439 A CN109299439 A CN 109299439A CN 201810961840 A CN201810961840 A CN 201810961840A CN 109299439 A CN109299439 A CN 109299439A
- Authority
- CN
- China
- Prior art keywords
- instruction
- target
- participle
- extraction
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of digital extraction method and apparatus, storage medium and electronic devices.Wherein, this method comprises: obtaining the instruction text to match with the inquiry instruction of input;Participle mark processing is executed to instruction text, obtains instruction participle set, wherein the instruction of each of instruction participle set is each configured with part of speech label;Determine that target instruction target word segments from instruction participle set according to part of speech label, wherein include effective digital information in target instruction target word participle;According to the positional relationship between target instruction target word participle included in instruction participle set, the target number with effective digital information matches is extracted from instruction text, wherein target number is to allow the number of machine recognition.The present invention solves the technical problem that digital extraction accuracy is low in the related technology.
Description
Technical field
The present invention relates to computer fields, in particular to a kind of digital extraction method and apparatus, storage medium and electricity
Sub-device.
Background technique
In the instruction that user inputs to hardware device, some digital informations can be usually carried, such as comprising for indicating goods
The information of the number word such as coin, time, length, distance.In order to facilitate hardware device to number entrained in above-mentioned digital information
Word executes corresponding machine processing operation, generally requires first to extract above-mentioned number from instruction.
Currently, hardware device is after getting the corresponding instruction text of instruction, common extracting mode are as follows: utilize canonical
Matching formula carries out simple match to instruction text, to extract number entrained by digital information in instruction text.However,
It usually will appear special number in instruction text, such as meaningless Chinese-character digital or Chinese-character digital and Arabic numerals, which are mixed, to be made
Digital composite.For above-mentioned special number, if continuing to will lead to number using digital extraction method used by the relevant technologies
The low problem of the accuracy that word extracts.
For above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of digital extraction method and apparatus, storage medium and electronic devices, at least to solve
The certainly low technical problem of digital extraction accuracy in the related technology.
According to an aspect of an embodiment of the present invention, a kind of digital extraction method is provided, comprising: obtain and look into what is inputted
Ask the instruction text that instruction matches;Participle mark processing is executed to above-metioned instruction text, obtains instruction participle set, wherein
The instruction of each of above-metioned instruction participle set is each configured with part of speech label;According to above-mentioned part of speech label from above-mentioned finger
It enables in participle set and determines that target instruction target word segments, wherein include effective digital information in above-mentioned target instruction target word participle;According to upper
The positional relationship between above-mentioned target instruction target word participle included in instruction participle set is stated, is extracted from above-metioned instruction text
With the target number of above-mentioned effective digital information matches, wherein above-mentioned target number is to allow the number of machine recognition.
According to another aspect of an embodiment of the present invention, a kind of digital extraction device is additionally provided, comprising: acquiring unit is used
In the instruction text that the inquiry instruction for obtaining with inputting matches;Processing unit, for executing participle mark to above-metioned instruction text
Note processing obtains instruction participle set, wherein the instruction of each of above-metioned instruction participle set is each configured with part of speech
Label;Determination unit, for determining that target instruction target word segments from above-metioned instruction participle set according to above-mentioned part of speech label,
In, it include effective digital information in above-mentioned target instruction target word participle;Extraction unit, for being wrapped according in above-metioned instruction participle set
The positional relationship between above-mentioned target instruction target word participle contained, extracts and above-mentioned effective digital information from above-metioned instruction text
The target number matched, wherein above-mentioned target number is to allow the number of machine recognition.
As a kind of optional example, said extracted unit includes: third extraction module, for having above-mentioned acquisition is above-mentioned
After the number format for imitating number entrained in digital information, in the case where above-mentioned number format is Arabic numerals,
Number entrained by above-mentioned effective digital information is extracted, as above-mentioned target number.
As a kind of optional example, above-mentioned determination unit includes: that third obtains module, is collected for segmenting from above-metioned instruction
In conjunction, the instruction participle that above-mentioned part of speech label is designated as number is obtained, is segmented as above-mentioned target instruction target word, wherein above-mentioned part of speech
Label is designated as in the instruction participle of number comprising above-mentioned effective digital information.
As a kind of optional example, above-mentioned acquiring unit includes at least one of: the 4th obtains module, for obtaining
The above-mentioned inquiry instruction of voice input;Identify command information entrained in above-mentioned inquiry instruction;It is raw according to above-metioned instruction information
At above-metioned instruction text;5th obtains module, for obtaining the above-mentioned inquiry instruction inputted by input equipment;Parse above-mentioned look into
Instruction is ask, above-metioned instruction text is obtained.
Another aspect according to an embodiment of the present invention, additionally provides a kind of storage medium, and meter is stored in the storage medium
Calculation machine program, wherein the computer program is arranged to execute above-mentioned digital extraction method when operation.
Another aspect according to an embodiment of the present invention, additionally provides a kind of electronic device, including memory, processor and deposits
Store up the computer program that can be run on a memory and on a processor, wherein above-mentioned processor passes through computer program and executes
Above-mentioned digital extraction method.
In embodiments of the present invention, the instruction text to match using the inquiry instruction for obtaining with inputting;To instruction text
This execution segments mark processing, obtains instruction participle set;Wherein, each of instruction participle set instruction participle is matched respectively
It is equipped with part of speech label;According to part of speech label from instruction participle set in determine target instruction target word participle according to instruction participle set in
The positional relationship between target instruction target word participle for being included, extracts the target with effective digital information matches from instruction text
The method of number.In the above-mentioned methods, during extracting target number, due to first having carried out participle mark to instruction text
Processing obtains instruction participle set, and is configured with part of speech label to each instruction participle in instruction participle set, so as to
When extracting target number, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position
Relationship extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves and extracts target number
Accuracy.And then solves the technical problem that digital extraction accuracy is low in the related technology.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of schematic diagram of the application environment of digital extraction method according to an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of digital extraction method according to an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram of digital extraction method according to an embodiment of the present invention;
Fig. 4 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 5 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 6 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 7 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 8 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 9 is a kind of structural schematic diagram of digital extraction device according to an embodiment of the present invention;
Figure 10 is a kind of structural schematic diagram of electronic device according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein can be in addition to illustrating herein or retouching
Sequence other than those of stating is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that covering
Non-exclusive includes, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to clearly
Those of list to Chu step or unit, but may include be not clearly listed or for these process, methods, product or
The intrinsic other step or units of equipment.
According to an aspect of an embodiment of the present invention, a kind of digital extraction method is provided, optionally, as a kind of optional
Embodiment, above-mentioned digital extraction method can be, but not limited to be applied to environment as shown in Figure 1 in.
Human-computer interaction can be carried out between user 102 and user equipment 104.User equipment 104 include memory 106 with
Processor 108.The inquiry instruction of the available user of user equipment 104 input, and according to above-mentioned inquiry instruction, obtain with it is above-mentioned
The matched instruction text of inquiry instruction.After getting above-metioned instruction text, user equipment 104 is literary by above-metioned instruction by network
Originally it is sent to server 112.It include index data base 114 in server 112, participle engine 116 and extraction engine 118.?
After server 112 gets above-metioned instruction text, above-metioned instruction text can be stored into index data base 114.Then, make
Above-metioned instruction text is segmented with participle engine 116, obtains participle set.Engine 118 is extracted according in participle set
Positional relationship between target instruction target word participle, extracts target number.Server 112 returns to target number to user equipment 104.
It should be noted that in the related technology, due to often there is meaningless Chinese character or digital composite in text, from
And when obtaining the number in text, accessed result precision is not high.And in the present embodiment, extracting target number
In the process, due to first having carried out the processing of participle mark to instruction text, instruction participle set is obtained, and in instruction participle set
It is configured with part of speech label to each instruction participle, so as to go out target according to part of speech tag extraction when extracting target number
Instruction participle, and according to target instruction target word segment between positional relationship extract target number, so as to target number into
The extraction of row precise and high efficiency improves the accuracy for extracting target number.
Optionally, above-mentioned digital extraction method can be, but not limited to be applied to calculate in the terminal of data, such as pen
Remember in the terminals such as this computer, PC machine, smart phone, intelligent sound box, smart home, headset equipment, above-mentioned network may include
But it is not limited to wireless network or cable network.Wherein, which includes: bluetooth, WIFI and other realization wireless communications
Network.Above-mentioned cable network can include but is not limited to: wide area network, Metropolitan Area Network (MAN), local area network.Above-mentioned server may include but not
It is limited to any hardware device that can be calculated.
Optionally, as an alternative embodiment, as shown in Fig. 2, above-mentioned digital extraction method includes:
S202 obtains the instruction text to match with the inquiry instruction of input;
S204 executes participle mark processing to instruction text, obtains instruction participle set, wherein in instruction participle set
Each instruction be each configured with part of speech label;
S206 determines that target instruction target word segments from instruction participle set according to part of speech label, wherein target instruction target word participle
In include effective digital information;
S208, according to the positional relationship between target instruction target word participle included in instruction participle set, from instruction text
In extract target number with effective digital information matches, wherein target number is to allow the number of machine recognition.
Optionally, as an alternative embodiment, above-mentioned digital extraction method can be, but not limited to be applied to statistics
During financial report, during being perhaps applied to land area assessment or during being applied to census.
Using the above method, during extracting target number, due to first having been carried out at participle mark to instruction text
Reason obtains instruction participle set, and in instruction participle set to each instruction participle configured with part of speech label, so as to
Extract target number when, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position close
System extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves the standard for extracting target number
True property.
Optionally, the instruction text that the inquiry instruction of above-mentioned acquisition and input matches can be, but not limited to by with lower section
Formula:
(1) input frame is shown on the display interface of terminal, when receiving the content of input frame input, will received
Above content as above-metioned instruction text.
For example, showing input frame on the display interface of terminal, receives in input frame and have input " occupied area 1000
After the printed words of mu ", above-mentioned " 1000 mu of occupied area " is used as instruction text.
(2) picture for carrying instruction text is received, text information is identified from above-mentioned picture, and will identify that
Text information is as instruction text.
For example, terminal receives the picture for carrying " 1000 mu of occupied area " printed words, first picture is identified, is identified
The printed words of " 1000 mu of occupied area " out, and above-mentioned text is acquired, by collected " 1000 mu of occupied area " as instruction text
This.
(3) after receiving selected instruction, using chosen text as instruction text.
Optionally, above-mentioned to receive the button that selected instruction can be, but not limited to the display interface for terminal and be pressed, it connects
Receive the phonetic order etc. of user's input.
For example, a button and word content can be shown on the display interface of terminal.Receiving above-mentioned button quilt
When pressing, the word content that user is selected executes subsequent digital extraction process as instruction text.
(4) voice input information is got, the voice that will acquire enters information as instruction text.
For example, the voice for receiving user's input inputs information, such as " 1000 mu of occupied area ", then by the above-mentioned language got
Message breath is converted to text information, and as instruction text.
Optionally, above-mentioned that the instruction text to will acquire can be, but not limited to instruction text execution participle mark processing
Multiple individual fields are split into, and add part of speech label for each field.
Optionally, above-mentioned is that each field addition part of speech label can be, but not limited to word to judge each field
Property.In the case where above-mentioned part of speech is noun part-of-speech, noun part-of-speech label is added for above-mentioned field;It is number word in above-mentioned part of speech
Property in the case where, for above-mentioned field add number part of speech label;It is above-mentioned field in the case where above-mentioned part of speech is verb part of speech
Add verb part of speech label;In the case where above-mentioned part of speech is adjective part of speech, adjective part of speech label is added for above-mentioned field;
In the case where above-mentioned part of speech is adverbial word part of speech, adverbial word part of speech label is added for above-mentioned field.In the feelings that above-mentioned part of speech is character
Under condition, alphanumeric tag is added for above-mentioned field.
Optionally, each field can correspond to one or more part of speech label.
It optionally, is that " registration examinee 1,000 ten thousand enrolls examinee 2,000,000 with above-mentioned instruction text." for, Fig. 3 is that one kind can
The word segmentation result of energy.After getting above-metioned instruction text, above-metioned instruction text is segmented, obtain " registering ", " examinee ",
" 1 ", " necessarily ", ", ", " admission ", " examining ", " life ", " 200 ", " ten thousand ", "." etc. multiple fields.Wherein, ten million includes two words
Property label, be number or adverbial word, life equally exist two part of speech labels, be verb or adjective.For above-mentioned each field
Part of speech label is added, so as to distinguish to above-mentioned multiple fields.
Optionally, above-mentioned to determine that target instruction target word participle can be, but not limited to from instruction participle set according to part of speech label
Are as follows: from instruction participle set, the instruction participle that part of speech label is designated as number is obtained, is segmented as target instruction target word, wherein word
Property label be designated as in the instruction participle of number comprising digital information.Wherein, the digital information include effect digital information with
Nonsignificant digit information, above-mentioned effective digital information is the number with mathematical meaning, such as indicates the number of quantity, such as 1,000,000,
The numbers such as 7000;Or the number of the expression of years, " 1998 " in such as " 1998 ", or the number for indicating distance, such as
" 50 " in 50 kilometers etc..Above-mentioned nonsignificant digit information is used to indicate the number of not mathematical meaning.Such as " agitated "
In " seven " and " eight ", there is no mathematical meanings.
Optionally, above-mentioned acquisition part of speech label be designated as number instruction participle, as target instruction target word participle can with but not
It is limited in the case where at least one part of speech label is number part of speech, obtains instruction participle corresponding with the number part of speech, as
Target instruction target word participle.
For example, being that " registration examinee 1,000 ten thousand enrolls examinee 2,000,000 continuing with above-metioned instruction text." the case where said
It is bright.After participle as shown in Figure 3, available " 1 ", " necessarily ", " 200 ", " ten thousand " four instructions are segmented, by above-mentioned four
A instruction participle segments as target instruction target word and obtains target number from above-mentioned target instruction target word participle.
Optionally, the positional relationship between the target instruction target word participle according to included in instruction participle set, from instruction text
Extract in this with the target number of effective digital information matches include: included in acquisition instruction participle set all effectively
The number format of entrained number in digital information;In the case where number format includes Chinese-character digital, segmented according to instruction
Positional relationship between the participle of target instruction target word included in set, determines the extraction mode of Chinese-character digital;According to the mode of extraction
Extract target number.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " with instruction text.Above-metioned instruction is literary
This participle and after marking part of speech, " contingency " is meaningless word, does not need to extract, and " ten thousand " after 1000 are significant word,
It needs to extract.Therefore, it is necessary to the positional relationships between being segmented according to target instruction target word, determine the extraction mode of above-mentioned Chinese-character digital.
Optionally, extracting from instruction text can be using such as lower section with the target number of effective digital information matches
Method:
(1) position in instruction participle set where at least two target instruction target words participle is continuous position, and at least two
In the case that the data type of effective digital information included in a target instruction target word participle is integer type, at least two are determined
The extraction mode of a target instruction target word participle is combination extraction mode;According to combination extraction mode, at least two target instruction target words are combined
Participle, obtains combined command field;Extract the target number to match with combined command field.
For example, continuing with above-mentioned instruction text as " if breaking down, loss will be more than 10,000,000 yuan ".To above-mentioned
After instruction text is segmented and marks part of speech, " 1000 " are detected, the position of " ten thousand " two target instruction target words participle is continuous position
It sets, then above-mentioned " 1000 ", " ten thousand " group is combined into " 10,000,000 ", and extracted, obtain target number.
(2) in the case that the position in instruction participle set where target instruction target word participle is discrete location, determination is mentioned
Modulus formula is discrete extraction mode;According to discrete extraction mode, extracting target instruction target word participle in instruction participle set respectively includes
Effective digital information entrained by number, as target number.
(3) in the case where number format is Arabic numerals, number entrained by effective digital information is extracted, as
Target number.
Optionally, the positional relationship between the target instruction target word participle according to included in instruction participle set, determines the Chinese
Before the extraction mode of characters/numerals, further includes: the first crucial participle and the second crucial participle in instruction participle set are obtained,
Wherein, the first crucial participle is adjacent with target instruction target word participle and is located at before target instruction target word participle, the second crucial participle and target
Instruction participle is adjacent and is located at after target instruction target word participle;By the first crucial participle, target instruction target word participle and the second crucial participle
Combination obtains candidate field;Digital composite template is called to be compared with candidate field;In candidate field and digital composite template
In matched situation, according to digital composite template extraction target number.
Optionally, the above-mentioned digital template that meets can be, but not limited to as score template, percentage template, decimal template, bear
Digital-to-analogue plate etc..
Overall description is carried out to above-mentioned digital extraction method below.As shown in figure 4, Fig. 4 is a kind of the aobvious of optional terminal
Show the schematic diagram at interface.There are two button, a button is the input for inputting instruction text for display on the display interface of terminal
Acquisition voice is inputted information, and be converted to text information and show after input button is pressed by button.Another button is
Button is extracted, after extracting button and being pressed, collected voice can be entered information as instruction text, and extracts instruction text
Target number in this.Optionally, in acquisition instruction text, it can receive selection instruction.The voice of selection is inputted into information
As instruction text.As shown in figure 5, the voice input information in Fig. 5 with underscore is the voice input information selected.?
After detecting that extracting button is pressed, " profit increases by 10 on a year-on-year basis more than 5,000,000 " is used as instruction text.
After getting instruction text, above-metioned instruction text is segmented and is marked, obtain " 500 ", " ten thousand ", " hundred ",
" ten " digital information, due to " 500 ", " ten thousand " two digital information bits set it is connected, " 500 ", " ten thousand " group are combined into " 500
It ten thousand " and extracts, is saved as target number.Or it after " 5,000,000 " are extracted as target number, are converted to " 5000000 " and protect
It deposits.And 10 are similarly the target number for needing to extract.Therefore, by 10 and pre-set digital composite template
It is compared.It presets and meets digital template for " * */* * ".After being compared with above-mentioned 10, above-mentioned hundred are extracted
/ ten are used as target number.Optionally, it after extracting above-mentioned 10, can be, but not limited to format,
Such as it is converted into decimal 0.1 and is saved.
Through this embodiment, during extracting target number, due to first having been carried out at participle mark to instruction text
Reason obtains instruction participle set, and in instruction participle set to each instruction participle configured with part of speech label, so as to
Extract target number when, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position close
System extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves the standard for extracting target number
True property.
As a kind of optional embodiment, according to the position between target instruction target word participle included in instruction participle set
Relationship is set, is extracted from instruction text and includes: with the target number of effective digital information matches
S1, acquisition instruction segment the digital lattice of number entrained in whole effective digital information included in set
Formula;
S2, in the case where number format includes Chinese-character digital, according to target instruction target word included in instruction participle set
Positional relationship between participle determines the extraction mode of Chinese-character digital;
S3 goes out target number according to schema extraction is extracted.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " with instruction text, as shown in fig. 6, Fig. 6
For a kind of possible the case where above-metioned instruction text is segmented and marked.As it can be seen that " ten thousand ", " one ", " 1000 ", " ten thousand " are marked
For number.And therefore " contingency " not meaningful number at the extraction, does not extract " contingency ".
Through this embodiment, in the case where including Chinese-character digital in instruction participle set, according in instruction participle set
Positional relationship between target instruction target word participle, determines the extraction mode of Chinese-character digital, to improve the spirit for extracting target number
Activity and accuracy.
As a kind of optional embodiment,
S1 determines Chinese-character digital according to the positional relationship between target instruction target word participle included in instruction participle set
Extraction mode include: position in instruction participle set where at least two target instruction target words participle be continuous position, and extremely
In the case that the data type of effective digital information included in few two target instruction target words participle is integer type, determine extremely
The extraction mode of few two target instruction target words participle is combination extraction mode;
S2 includes: according to combination extraction mode according to extracting schema extraction to go out target number, and at least two targets of combination refer to
Participle is enabled, combined command field is obtained;Extract the target number to match with combined command field.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " continuing with above-metioned instruction text, right
After above-metioned instruction text is segmented and marked, as shown in fig. 7, extracted to " 1000 ", " ten thousand ", and it will extract
" 1000 ", " ten thousand " are combined, and combination obtains " 10,000,000 ", and " 10,000,000 " are used as target number.
Through this embodiment, by extracting the target number in instruction participle set according to combination extraction mode, so as to
Accurately and efficiently to extract the target number in instruction participle combination according to the actual situation, the effect for extracting target number is improved
Rate.
As a kind of optional embodiment,
S1 determines Chinese-character digital according to the positional relationship between target instruction target word participle included in instruction participle set
Extraction mode include: instruction participle set in target instruction target word participle where position be discrete location in the case where, really
Surely extracting mode is discrete extraction mode;
S2 includes: to extract instruction participle collection respectively according to discrete extraction mode according to extracting schema extraction to go out target number
Number entrained by the effective digital information that target instruction target word participle includes in conjunction, as target number.
For example, being this for " 2 points 3 yuan of profit " with instruction.Due to " 20 ", " 30,000 " in above-metioned instruction text
Between and it is discontinuous, therefore, it is necessary to extract with discrete extraction mode, obtain target data " 20.3 ten thousand ".
Through this embodiment, by extracting the target number in instruction participle set according to discrete extraction mode, so as to
Accurately and efficiently to extract the target number in instruction participle combination according to the actual situation, the effect for extracting target number is improved
Rate.
As a kind of optional embodiment, between the target instruction target word participle according to included in instruction participle set
Positional relationship, before the extraction mode for determining Chinese-character digital, further includes:
S1 obtains the first crucial participle and the second crucial participle in instruction participle set, wherein the first crucial participle
Adjacent with target instruction target word participle and be located at before target instruction target word participle, the second crucial participle segments adjacent and is located at target instruction target word
After target instruction target word participle;
First crucial participle, target instruction target word participle and the second crucial participle combination are obtained candidate field by S2;
S3 calls digital composite template to be compared with candidate field;
S4, in the case where candidate field and digital composite template matching, according to digital composite template extraction target number.
Optionally, above-mentioned first keyword and the second keyword can be, but not limited to as some significant words.For example, can be with
For for indicating the word of the number such as score, decimal, negative.
For example, Fig. 8 is several optional composite number type matrixes so that instruction text is " eighth commodity are sold " as an example
The schematic diagram of plate.1/8th are met after digital template matched with above-mentioned, uses the template extraction of " * */* * "
Target number obtains 1/8th.After obtaining above-mentioned 1/8th, extended formatting can be converted to by above-mentioned 1/8th.
Through this embodiment, meet digital template extraction target number by calling, so as to extract complex situations
Digital information, improve extract target number flexibility and accuracy.
As a kind of optional embodiment, in obtaining effective digital information the number format of entrained number it
Afterwards, further includes:
S1 extracts number entrained by effective digital information in the case where number format is Arabic numerals, as
Target number.
For example, with instruction text be " 23 degree of north latitude, 67 degree of east longitude " for, after getting above-metioned instruction text, due to
Number format in above-metioned instruction text is Arabic numerals, therefore, can be directly to the Arab in above-metioned instruction text
Number extracts, and obtains target number.
Through this embodiment, by directly extracting Arabic numerals, so as to be Arabic numerals in number format
In the case of, target number is accurately and efficiently extracted, the efficiency for extracting target number is improved.
As a kind of optional embodiment, determine that target instruction target word segments from instruction participle set according to part of speech label
Include:
S1 obtains the instruction participle that part of speech label is designated as number, as target instruction target word point from instruction participle set
Word, wherein part of speech label is designated as in the instruction participle of number comprising effective digital information.
For example, the case where being " 23 degree of north latitude, 67 degree of east longitude " continuing with above-metioned instruction text, is illustrated.To above-mentioned
After " 23 degree of north latitude, 67 degree of east longitude " are segmented and marked, " 23 " are obtained, the part of speech of " 67 " is number.Then it is by above-mentioned part of speech
" 23 ", " 67 " of number extract, as target number.
Through this embodiment, it is segmented by obtaining the instruction that part of speech label is number from instruction participle set, as mesh
Mark instruction participle improves so as to extract target instruction target word participle from instruction participle set and obtains target instruction target word participle
Flexibility.
As a kind of optional embodiment, obtain the instruction text to match with the inquiry instruction of input include with down toward
It is one of few:
(1) inquiry instruction of voice input is obtained;Identify command information entrained in inquiry instruction;According to command information
Generate instruction text;
(2) inquiry instruction inputted by input equipment is obtained;Inquiry instruction is parsed, instruction text is obtained.
For example, as shown in figure 4, Fig. 4 is a kind of schematic diagram of the display interface of optional terminal.In the display interface of terminal
There are two button, a button is that the input button of input instruction text will acquire language after input button is pressed for upper display
Sound inputs information, and is converted to text information and shows.Another button is to extract button, after extracting button and being pressed, can be incited somebody to action
Collected voice enters information as instruction text, and extracts the target number in instruction text.Optionally, in acquisition instruction
When text, selection instruction can receive.The voice of selection is entered information as into instruction text.As shown in figure 5, being had in Fig. 5
The voice input information of underscore is the voice input information selected.After detecting that extracting button is pressed, " it will get a profit super
5,000,000 are crossed, increases by 10 on a year-on-year basis " as instruction text.
Through this embodiment, by one of any acquisition instruction text of the above method, to improve acquisition instruction text
Flexibility.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
Other side according to an embodiment of the present invention additionally provides a kind of for implementing the number of above-mentioned digital extraction method
Word extraction element.As shown in figure 9, the device includes:
(1) acquiring unit 902, the instruction text that the inquiry instruction for obtaining and inputting matches;
(2) processing unit 904 obtain instruction participle set for executing participle mark processing to instruction text, wherein
The instruction of each of instruction participle set is each configured with part of speech label;
(3) determination unit 906, for determining that target instruction target word segments from instruction participle set according to part of speech label,
In, it include effective digital information in target instruction target word participle;
(4) extraction unit 908 are closed for the position between the target instruction target word participle according to included in instruction participle set
System, extracts the target number with effective digital information matches from instruction text, wherein target number is to allow machine recognition
Number.
Optionally, above-mentioned digital extraction device can be, but not limited to during being applied to statistics financial report, Huo Zheying
During for land area assessment, or applied to during census.
Optionally, above-mentioned digital extraction device can be, but not limited to be applied on intelligent terminal, for example, being applied to mobile phone
On.
Using above-mentioned apparatus, during extracting target number, due to first having been carried out at participle mark to instruction text
Reason obtains instruction participle set, and in instruction participle set to each instruction participle configured with part of speech label, so as to
Extract target number when, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position close
System extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves the standard for extracting target number
True property.
Optionally, the instruction text that the inquiry instruction of above-mentioned acquisition and input matches can be, but not limited to by with lower section
Formula:
(1) input frame is shown on the display interface of terminal, when receiving the content of input frame input, will received
Above content as above-metioned instruction text.
For example, showing input frame on the display interface of terminal, receives in input frame and have input " occupied area 1000
After the printed words of mu ", above-mentioned " 1000 mu of occupied area " is used as instruction text.
(2) picture for carrying instruction text is received, text information is identified from above-mentioned picture, and will identify that
Text information is as instruction text.
For example, terminal receives the picture for carrying " 1000 mu of occupied area " printed words, first picture is identified, is identified
The printed words of " 1000 mu of occupied area " out, and above-mentioned text is acquired, by collected " 1000 mu of occupied area " as instruction text
This.
(3) after receiving selected instruction, using chosen text as instruction text.
Optionally, above-mentioned to receive the button that selected instruction can be, but not limited to the display interface for terminal and be pressed, it connects
Receive the phonetic order etc. of user's input.
For example, a button and word content can be shown on the display interface of terminal.Receiving above-mentioned button quilt
When pressing, the word content that user is selected executes subsequent digital extraction process as instruction text.
(4) voice input information is got, the voice that will acquire enters information as instruction text.
For example, the voice for receiving user's input inputs information, such as " 1000 mu of occupied area ", then by the above-mentioned language got
Message breath is converted to text information, and as instruction text.
Optionally, above-mentioned that the instruction text to will acquire can be, but not limited to instruction text execution participle mark processing
Multiple individual fields are split into, and add part of speech label for each field.
Optionally, above-mentioned is that each field addition part of speech label can be, but not limited to word to judge each field
Property.In the case where above-mentioned part of speech is noun part-of-speech, noun part-of-speech label is added for above-mentioned field;It is number word in above-mentioned part of speech
Property in the case where, for above-mentioned field add number part of speech label;It is above-mentioned field in the case where above-mentioned part of speech is verb part of speech
Add verb part of speech label;In the case where above-mentioned part of speech is adjective part of speech, adjective part of speech label is added for above-mentioned field;
In the case where above-mentioned part of speech is adverbial word part of speech, adverbial word part of speech label is added for above-mentioned field.In the feelings that above-mentioned part of speech is character
Under condition, alphanumeric tag is added for above-mentioned field.
Optionally, each field can correspond to one or more part of speech label.
It optionally, is that " registration examinee 1,000 ten thousand enrolls examinee 2,000,000 with above-mentioned instruction text." for, Fig. 3 is that one kind can
The word segmentation result of energy.After getting above-metioned instruction text, above-metioned instruction text is segmented, obtain " registering ", " examinee ",
" 1 ", " necessarily ", ", ", " admission ", " examining ", " life ", " 200 ", " ten thousand ", "." etc. multiple fields.Wherein, ten million includes two words
Property label, be number or adverbial word, life equally exist two part of speech labels, be verb or adjective.For above-mentioned each field
Part of speech label is added, so as to distinguish to above-mentioned multiple fields.
Optionally, above-mentioned to determine that target instruction target word participle can be, but not limited to from instruction participle set according to part of speech label
Are as follows: from instruction participle set, the instruction participle that part of speech label is designated as number is obtained, is segmented as target instruction target word, wherein word
Property label be designated as in the instruction participle of number comprising digital information.Wherein, the digital information include effect digital information with
Nonsignificant digit information, above-mentioned effective digital information is the number with mathematical meaning, such as indicates the number of quantity, such as 1,000,000,
The numbers such as 7000;Or the number of the expression of years, " 1998 " in such as " 1998 ", or the number for indicating distance, such as
" 50 " in 50 kilometers etc..Above-mentioned nonsignificant digit information is used to indicate the number of not mathematical meaning.Such as " agitated "
In " seven " and " eight ", there is no mathematical meanings.
Optionally, above-mentioned acquisition part of speech label be designated as number instruction participle, as target instruction target word participle can with but not
It is limited in the case where at least one part of speech label is number part of speech, obtains instruction participle corresponding with the number part of speech, as
Target instruction target word participle.
For example, being that " registration examinee 1,000 ten thousand enrolls examinee 2,000,000 continuing with above-metioned instruction text." the case where said
It is bright.After participle as shown in Figure 3, available " 1 ", " necessarily ", " 200 ", " ten thousand " four instructions are segmented, by above-mentioned four
A instruction participle segments as target instruction target word and obtains target number from above-mentioned target instruction target word participle.
Optionally, the positional relationship between the target instruction target word participle according to included in instruction participle set, from instruction text
Extract in this with the target number of effective digital information matches include: included in acquisition instruction participle set all effectively
The number format of entrained number in digital information;In the case where number format includes Chinese-character digital, segmented according to instruction
Positional relationship between the participle of target instruction target word included in set, determines the extraction mode of Chinese-character digital;According to the mode of extraction
Extract target number.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " with instruction text.Above-metioned instruction is literary
This participle and after marking part of speech, " contingency " is meaningless word, does not need to extract, and " ten thousand " after 1000 are significant word,
It needs to extract.Therefore, it is necessary to the positional relationships between being segmented according to target instruction target word, determine the extraction mode of above-mentioned Chinese-character digital.
Optionally, extracting from instruction text can be using such as lower section with the target number of effective digital information matches
Method:
(1) position in instruction participle set where at least two target instruction target words participle is continuous position, and at least two
In the case that the data type of effective digital information included in a target instruction target word participle is integer type, at least two are determined
The extraction mode of a target instruction target word participle is combination extraction mode;According to combination extraction mode, at least two target instruction target words are combined
Participle, obtains combined command field;Extract the target number to match with combined command field.
For example, continuing with above-mentioned instruction text as " if breaking down, loss will be more than 10,000,000 yuan ".To above-mentioned
After instruction text is segmented and marks part of speech, " 1000 " are detected, the position of " ten thousand " two target instruction target words participle is continuous position
It sets, then above-mentioned " 1000 ", " ten thousand " group is combined into " 10,000,000 ", and extracted, obtain target number.
(2) in the case that the position in instruction participle set where target instruction target word participle is discrete location, determination is mentioned
Modulus formula is discrete extraction mode;According to discrete extraction mode, extracting target instruction target word participle in instruction participle set respectively includes
Effective digital information entrained by number, as target number.
(3) in the case where number format is Arabic numerals, number entrained by effective digital information is extracted, as
Target number.
Optionally, the positional relationship between the target instruction target word participle according to included in instruction participle set, determines the Chinese
Before the extraction mode of characters/numerals, further includes: the first crucial participle and the second crucial participle in instruction participle set are obtained,
Wherein, the first crucial participle is adjacent with target instruction target word participle and is located at before target instruction target word participle, the second crucial participle and target
Instruction participle is adjacent and is located at after target instruction target word participle;By the first crucial participle, target instruction target word participle and the second crucial participle
Combination obtains candidate field;Digital composite template is called to be compared with candidate field;In candidate field and digital composite template
In matched situation, according to digital composite template extraction target number.
Optionally, the above-mentioned digital template that meets can be, but not limited to as score template, percentage template, decimal template, bear
Digital-to-analogue plate etc..
Overall description is carried out to above-mentioned digital extraction method below.As shown in figure 4, Fig. 4 is a kind of the aobvious of optional terminal
Show the schematic diagram at interface.There are two button, a button is the input for inputting instruction text for display on the display interface of terminal
Acquisition voice is inputted information, and be converted to text information and show after input button is pressed by button.Another button is
Button is extracted, after extracting button and being pressed, collected voice can be entered information as instruction text, and extracts instruction text
Target number in this.Optionally, in acquisition instruction text, it can receive selection instruction.The voice of selection is inputted into information
As instruction text.As shown in figure 5, the voice input information in Fig. 5 with underscore is the voice input information selected.?
After detecting that extracting button is pressed, " profit increases by 10 on a year-on-year basis more than 5,000,000 " is used as instruction text.
After getting instruction text, above-metioned instruction text is segmented and is marked, obtain " 500 ", " ten thousand ", " hundred ",
" ten " digital information, due to " 500 ", " ten thousand " two digital information bits set it is connected, " 500 ", " ten thousand " group are combined into " 500
It ten thousand " and extracts, is saved as target number.Or it after " 5,000,000 " are extracted as target number, are converted to " 5000000 " and protect
It deposits.And 10 are similarly the target number for needing to extract.Therefore, by 10 and pre-set digital composite template
It is compared.It presets and meets digital template for " * */* * ".After being compared with above-mentioned 10, above-mentioned hundred are extracted
/ ten are used as target number.Optionally, it after extracting above-mentioned 10, can be, but not limited to format,
Such as it is converted into decimal 0.1 and is saved.
Through this embodiment, during extracting target number, due to first having been carried out at participle mark to instruction text
Reason obtains instruction participle set, and in instruction participle set to each instruction participle configured with part of speech label, so as to
Extract target number when, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position close
System extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves the standard for extracting target number
True property.
As a kind of optional embodiment, said extracted unit includes:
(1) first obtains module, for being taken in whole effective digital information included in acquisition instruction participle set
The number format of the number of band;
(2) determining module, for being wrapped according in instruction participle set in the case where number format includes Chinese-character digital
The positional relationship between target instruction target word participle contained, determines the extraction mode of Chinese-character digital;
(3) first extraction modules, for going out target number according to extraction schema extraction.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " with instruction text, as shown in fig. 6, Fig. 6
For a kind of possible the case where above-metioned instruction text is segmented and marked.As it can be seen that " ten thousand ", " one ", " 1000 ", " ten thousand " are marked
For number.And therefore " contingency " not meaningful number at the extraction, does not extract " contingency ".
Through this embodiment, in the case where including Chinese-character digital in instruction participle set, according in instruction participle set
Positional relationship between target instruction target word participle, determines the extraction mode of Chinese-character digital, to improve the spirit for extracting target number
Activity and accuracy.
As a kind of optional embodiment,
(1) above-mentioned determining module includes: the first determining submodule, is referred to for segmenting at least two targets in set in instruction
Enable the position where segmenting for continuous position, and the data of effective digital information included at least two target instruction target words participle
In the case that type is integer type, determine the extraction mode of at least two target instruction target words participle for combination extraction mode;
(2) above-mentioned first extraction module includes: the first extracting sub-module, for according to combination extraction mode, combination to be at least
Two target instruction target word participles, obtain combined command field;Extract the target number to match with combined command field.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " continuing with above-metioned instruction text, right
After above-metioned instruction text is segmented and marked, as shown in fig. 7, extracted to " 1000 ", " ten thousand ", and it will extract
" 1000 ", " ten thousand " are combined, and combination obtains " 10,000,000 ", and " 10,000,000 " are used as target number.
Through this embodiment, by extracting the target number in instruction participle set according to combination extraction mode, so as to
Accurately and efficiently to extract the target number in instruction participle combination according to the actual situation, the effect for extracting target number is improved
Rate.
As a kind of optional embodiment,
(1) above-mentioned determining module includes: the second determining submodule, for the target instruction target word participle institute in instruction participle set
Position be discrete location in the case where, determine extraction mode be discrete extraction mode;
(2) above-mentioned first extraction module includes: the second extracting sub-module, for being extracted respectively according to discrete extraction mode
Number entrained by the effective digital information that target instruction target word participle includes in instruction participle set, as target number.
For example, being this for " 2 points 3 yuan of profit " with instruction.Due to " 20 ", " 30,000 " in above-metioned instruction text
Between and it is discontinuous, therefore, it is necessary to extract with discrete extraction mode, obtain target data " 20.3 ten thousand ".
Through this embodiment, by extracting the target number in instruction participle set according to discrete extraction mode, so as to
Accurately and efficiently to extract the target number in instruction participle combination according to the actual situation, the effect for extracting target number is improved
Rate.
As a kind of optional embodiment, said extracted unit further include:
(1) second obtains module, for the position between the target instruction target word participle according to included in instruction participle set
Relationship is set, before the extraction mode for determining Chinese-character digital, obtains and is instructing the in participle set first crucial participle and the second pass
Key participle, wherein the first crucial participle is adjacent with target instruction target word participle and is located at before target instruction target word participle, the second crucial participle
It segments adjacent with target instruction target word and is located at after target instruction target word participle;
(2) composite module, for the first crucial participle, target instruction target word participle and the second crucial participle combination to be obtained candidate
Field;
(3) comparison module, for calling digital composite template to be compared with candidate field;
(4) second extraction modules are used in the case where candidate field and digital composite template matching, according to digital composite
Template extraction target number.
Optionally, above-mentioned first keyword and the second keyword can be, but not limited to as some significant words.For example, can be with
For for indicating the word of the number such as score, decimal, negative.
For example, Fig. 8 is several optional composite number type matrixes so that instruction text is " eighth commodity are sold " as an example
The schematic diagram of plate.1/8th are met after digital template matched with above-mentioned, uses the template extraction of " * */* * "
Target number obtains 1/8th.After obtaining above-mentioned 1/8th, extended formatting can be converted to by above-mentioned 1/8th.
Through this embodiment, meet digital template extraction target number by calling, so as to extract complex situations
Digital information, improve extract target number flexibility and accuracy.
As a kind of optional embodiment, said extracted unit further include:
(1) third extraction module, after entrained digital number format in obtaining effective digital information,
In the case that number format is Arabic numerals, number entrained by effective digital information is extracted, as target number.
For example, with instruction text be " 23 degree of north latitude, 67 degree of east longitude " for, after getting above-metioned instruction text, due to
Number format in above-metioned instruction text is Arabic numerals, therefore, can be directly to the Arab in above-metioned instruction text
Number extracts, and obtains target number.
Through this embodiment, by directly extracting Arabic numerals, so as to be Arabic numerals in number format
In the case of, target number is accurately and efficiently extracted, the efficiency for extracting target number is improved.
As a kind of optional embodiment, above-mentioned determination unit includes:
(1) third obtains module, for obtaining the instruction point that part of speech label is designated as number from instruction participle set
Word is segmented as target instruction target word, wherein part of speech label is designated as in the instruction participle of number comprising effective digital information.
For example, the case where being " 23 degree of north latitude, 67 degree of east longitude " continuing with above-metioned instruction text, is illustrated.To above-mentioned
After " 23 degree of north latitude, 67 degree of east longitude " are segmented and marked, " 23 " are obtained, the part of speech of " 67 " is number.Then it is by above-mentioned part of speech
" 23 ", " 67 " of number extract, as target number.
Through this embodiment, it is segmented by obtaining the instruction that part of speech label is number from instruction participle set, as mesh
Mark instruction participle improves so as to extract target instruction target word participle from instruction participle set and obtains target instruction target word participle
Flexibility.
As a kind of optional embodiment, above-mentioned acquiring unit includes at least one of:
(1) the 4th obtains module, for obtaining the inquiry instruction of voice input;Identify instruction entrained in inquiry instruction
Information;Instruction text is generated according to command information;
(2) the 5th obtain module, for obtaining the inquiry instruction inputted by input equipment;Inquiry instruction is parsed, is obtained
Instruction text.
For example, as shown in figure 4, Fig. 4 is a kind of schematic diagram of the display interface of optional terminal.In the display interface of terminal
There are two button, a button is that the input button of input instruction text will acquire language after input button is pressed for upper display
Sound inputs information, and is converted to text information and shows.Another button is to extract button, after extracting button and being pressed, can be incited somebody to action
Collected voice enters information as instruction text, and extracts the target number in instruction text.Optionally, in acquisition instruction
When text, selection instruction can receive.The voice of selection is entered information as into instruction text.As shown in figure 5, being had in Fig. 5
The voice input information of underscore is the voice input information selected.After detecting that extracting button is pressed, " it will get a profit super
5,000,000 are crossed, increases by 10 on a year-on-year basis " as instruction text.
Through this embodiment, by one of any acquisition instruction text of the above method, to improve acquisition instruction text
Flexibility.
Another aspect according to an embodiment of the present invention additionally provides a kind of for implementing the electricity of above-mentioned digital extraction method
Sub-device, as shown in Figure 10, the electronic device include memory 1002 and processor 1004, are stored with meter in the memory 1002
Calculation machine program, the processor 1004 are arranged to execute the step in any of the above-described embodiment of the method by computer program.
Optionally, in the present embodiment, above-mentioned electronic device can be located in multiple network equipments of computer network
At least one network equipment.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 obtains the instruction text to match with the inquiry instruction of input;
S2 executes participle mark processing to instruction text, obtains instruction participle set, wherein in instruction participle set
Each instruction is each configured with part of speech label;
S3 determines that target instruction target word segments from instruction participle set according to part of speech label, wherein in target instruction target word participle
Include effective digital information;
S4, according to the positional relationship between target instruction target word participle included in instruction participle set, from instruction text
Extract the target number with effective digital information matches, wherein target number is to allow the number of machine recognition.
Optionally, it will appreciated by the skilled person that structure shown in Fig. 10 is only to illustrate, electronic device can also
To be smart phone (such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device
The terminal devices such as (Mobile Internet Devices, MID), PAD.Figure 10 it does not make to the structure of above-mentioned electronic device
At restriction.For example, electronic device may also include than shown in Figure 10 more perhaps less component (such as network interface) or
With the configuration different from shown in Figure 10.
Wherein, memory 1002 can be used for storing software program and module, such as the digital extraction in the embodiment of the present invention
Corresponding program instruction/the module of method and apparatus, the software program that processor 1004 is stored in memory 1002 by operation
And module realizes above-mentioned digital extraction method thereby executing various function application and data processing.Memory 1002
It may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic storage device dodges
It deposits or other non-volatile solid state memories.In some instances, memory 1002 can further comprise relative to processor
1004 remotely located memories, these remote memories can pass through network connection to terminal.The example of above-mentioned network includes
But be not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.Wherein, memory 1002 specifically can with but
It is not limited to use in the information such as store instruction text, target data.As an example, as shown in Figure 10, above-mentioned memory 1002
In can be, but not limited to include acquiring unit 902 in above-mentioned digital extraction device, processing unit 904, determination unit 906 and mention
Take unit 908.In addition, it can include but other modular units for being not limited in above-mentioned digital extraction device, in this example not
It repeats again.
Optionally, above-mentioned transmitting device 1006 is used to that data to be received or sent via a network.Above-mentioned network
Specific example may include cable network and wireless network.In an example, transmitting device 1006 includes a network adapter
(Network Interface Controller, NIC), can be connected by cable with other network equipments with router to
It can be communicated with internet or local area network.In an example, transmitting device 1006 be radio frequency (Radio Frequency,
RF) module is used to wirelessly be communicated with internet.
In addition, above-mentioned electronic device further include: display 1008, for contents such as displaying target numbers;With connection bus
1010, for connecting the modules component in above-mentioned electronic device.
The another aspect of embodiment according to the present invention, additionally provides a kind of storage medium, is stored in the storage medium
Computer program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1 obtains the instruction text to match with the inquiry instruction of input;
S2 executes participle mark processing to instruction text, obtains instruction participle set, wherein in instruction participle set
Each instruction is each configured with part of speech label;
S3 determines that target instruction target word segments from instruction participle set according to part of speech label, wherein in target instruction target word participle
Include effective digital information;
S4, according to the positional relationship between target instruction target word participle included in instruction participle set, from instruction text
Extract the target number with effective digital information matches, wherein target number is to allow the number of machine recognition.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1, acquisition instruction segment the digital lattice of number entrained in whole effective digital information included in set
Formula;
S2, in the case where number format includes Chinese-character digital, according to target instruction target word included in instruction participle set
Positional relationship between participle determines the extraction mode of Chinese-character digital;
S3 goes out target number according to schema extraction is extracted.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1, the position in instruction participle set where at least two target instruction target words participle are continuous position, and at least two
In the case that the data type of effective digital information included in a target instruction target word participle is integer type, at least two are determined
The extraction mode of a target instruction target word participle is combination extraction mode;
S2, according to combination extraction mode, combination at least two target instruction target words participle obtains combined command field;Extract with
The target number that combined command field matches.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1, in the case that the position in instruction participle set where target instruction target word participle is discrete location, determination is mentioned
Modulus formula is discrete extraction mode;
S2 extracts target instruction target word in instruction participle set respectively and segments the effective digital for including according to discrete extraction mode
Number entrained by information, as target number.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1 obtains the first crucial participle and the second crucial participle in instruction participle set, wherein the first crucial participle
Adjacent with target instruction target word participle and be located at before target instruction target word participle, the second crucial participle segments adjacent and is located at target instruction target word
After target instruction target word participle;
First crucial participle, target instruction target word participle and the second crucial participle combination are obtained candidate field by S2;
S3 calls digital composite template to be compared with candidate field;
S4, in the case where candidate field and digital composite template matching, according to digital composite template extraction target number.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1 extracts number entrained by effective digital information in the case where number format is Arabic numerals, as
Target number.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
S1 obtains the instruction participle that part of speech label is designated as number, as target instruction target word point from instruction participle set
Word, wherein part of speech label is designated as in the instruction participle of number comprising effective digital information.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
(1) inquiry instruction of voice input is obtained;Identify command information entrained in inquiry instruction;According to command information
Generate instruction text;
(2) inquiry instruction inputted by input equipment is obtained;Inquiry instruction is parsed, instruction text is obtained.
Optionally, in the present embodiment, those of ordinary skill in the art will appreciate that in the various methods of above-described embodiment
All or part of the steps be that the relevant hardware of terminal device can be instructed to complete by program, the program can store in
In one computer readable storage medium, storage medium may include: flash disk, read-only memory (Read-Only Memory,
ROM), random access device (Random Access Memory, RAM), disk or CD etc..
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product
When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention
Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme
The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one
Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention
State all or part of the steps of method.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side
Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one
Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (15)
1. a kind of digital extraction method characterized by comprising
Obtain the instruction text to match with the inquiry instruction of input;
Participle mark processing is executed to described instruction text, obtains instruction participle set, wherein in described instruction participle set
Each instruction is each configured with part of speech label;
Determine that target instruction target word segments from described instruction participle set according to the part of speech label, wherein the target instruction target word
It include effective digital information in participle;
Positional relationship between the target instruction target word participle according to included in described instruction participle set, from described instruction text
The target number with the effective digital information matches is extracted in this, wherein the target number is to allow machine recognition
Number.
2. the method according to claim 1, wherein the mesh according to included in instruction participle set
Positional relationship between mark instruction participle, extracts the number of targets with the effective digital information matches from described instruction text
Word includes:
Obtain the digital lattice of number entrained in whole effective digital information included in described instruction participle set
Formula;
In the case where the number format includes Chinese-character digital, according to the target included in described instruction participle set
Positional relationship between instruction participle, determines the extraction mode of the Chinese-character digital;
Go out the target number according to the extraction schema extraction.
3. according to the method described in claim 2, it is characterized in that,
Positional relationship between the target instruction target word participle according to included in described instruction participle set, determine described in
The extraction mode of Chinese-character digital includes: the position in described instruction participle set where at least two target instruction target word participles
For continuous position, and the data type of the effective digital information included in at least two target instruction target word participle
In the case where being integer type, determine that the extraction mode of the described at least two target instruction target word participles is extracted for combination
Mode;
It is described that go out the target number according to the extraction schema extraction include: according to the combination extraction mode, described in combination
At least two target instruction target word participles, obtain combined command field;Extraction matches described with the combined command field
Target number.
4. according to the method described in claim 2, it is characterized in that,
Positional relationship between the target instruction target word participle according to included in described instruction participle set, determine described in
The extraction mode of Chinese-character digital includes: that the position where the target instruction target word described in described instruction participle set segments is discrete
In the case where position, determine that the extraction mode is discrete extraction mode;
It is described that go out the target number according to the extraction schema extraction include: to extract respectively according to the discrete extraction mode
Number entrained by the effective digital information that target instruction target word participle described in described instruction participle set includes, as described
Target number.
5. according to the method described in claim 2, it is characterized in that, described according to included in described instruction participle set
Positional relationship between target instruction target word participle, before the extraction mode for determining the Chinese-character digital, further includes:
Obtain the first crucial participle and the second crucial participle in described instruction participle set, wherein described first crucial point
Word and the target instruction target word segment adjacent and are located at before target instruction target word participle, the described second crucial participle and the target
Instruction participle is adjacent and is located at after target instruction target word participle;
Described first crucial participle, target instruction target word participle and the second crucial participle combination are obtained into candidate field;
Digital composite template is called to be compared with the candidate field;
In the case where the candidate field and the digital composite template matching, according to described in the digital composite template extraction
Target number.
6. according to the method described in claim 2, it is characterized in that, it is described obtain it is entrained in the effective digital information
After the number format of number, further includes:
In the case where the number format is Arabic numerals, number entrained by the effective digital information is extracted, is made
For the target number.
7. method according to any one of claim 1 to 6, described segmented according to the part of speech label from described instruction collects
Determine that target instruction target word participle includes: in conjunction
From described instruction participle set, the instruction participle that the part of speech label is designated as number is obtained, is referred to as the target
Enable participle, wherein the part of speech label is designated as in the instruction participle of number comprising the effective digital information.
8. method according to any one of claim 1 to 6, which is characterized in that the inquiry instruction of the acquisition and input
The instruction text to match includes at least one of:
Obtain the inquiry instruction of voice input;Identify command information entrained in the inquiry instruction;According to the finger
Information is enabled to generate described instruction text;
Obtain the inquiry instruction inputted by input equipment;The inquiry instruction is parsed, described instruction text is obtained.
9. a kind of digital extraction device characterized by comprising
Acquiring unit, the instruction text that the inquiry instruction for obtaining and inputting matches;
Processing unit obtains instruction participle set, wherein the finger for executing participle mark processing to described instruction text
The instruction of each of participle set is enabled to be each configured with part of speech label;
Determination unit, for determining that target instruction target word segments from described instruction participle set according to the part of speech label, wherein
It include effective digital information in the target instruction target word participle;
Extraction unit is closed for the position between the target instruction target word participle according to included in described instruction participle set
System, extracts the target number with the effective digital information matches from described instruction text, wherein the target number is
Allow the number of machine recognition.
10. device according to claim 9, which is characterized in that the extraction unit includes:
First obtains module, is taken for obtaining in whole effective digital information included in described instruction participle set
The number format of the number of band;
Determining module, for segmenting institute in set according to described instruction in the case where the number format includes Chinese-character digital
The positional relationship between target instruction target word participle for including, determines the extraction mode of the Chinese-character digital;
First extraction module, for going out the target number according to the extraction schema extraction.
11. device according to claim 10, which is characterized in that
The determining module includes: the first determining submodule, at least two targets in described instruction participle set
Position where instructing participle is continuous position, and described effective included in at least two target instruction target word participle
In the case that the data type of digital information is integer type, the described of the described at least two target instruction target word participles is determined
Extraction mode is combination extraction mode;
First extraction module includes: the first extracting sub-module, for according to the combination extraction mode, combination to be described at least
Two target instruction target word participles, obtain combined command field;Extract the target to match with the combined command field
Number.
12. device according to claim 10, which is characterized in that
The determining module includes: the second determining submodule, for the target instruction target word participle described in described instruction participle set
In the case that the position at place is discrete location, determine that the extraction mode is discrete extraction mode;
First extraction module includes: the second extracting sub-module, described in being extracted respectively according to the discrete extraction mode
Number entrained by the effective digital information that target instruction target word participle described in instruction participle set includes, as the target
Number.
13. device according to claim 10, which is characterized in that the extraction unit further include:
Second obtains module, between the target instruction target word participle described according to included in described instruction participle set
Positional relationship, before the extraction mode for determining the Chinese-character digital, obtain first crucial in described instruction participle set
Participle and the second crucial participle, wherein the described first crucial participle is adjacent with target instruction target word participle and is located at the target
Before instruction participle, second key, which segments, and target instruction target word participle is adjacent and is located at the target instruction target word segments it
Afterwards;
Composite module, for combining the described first crucial participle, target instruction target word participle and the second crucial participle
To candidate field;
Comparison module, for calling digital composite template to be compared with the candidate field;
Second extraction module is used in the case where the candidate field and the digital composite template matching, according to described multiple
It closes digital template and extracts the target number.
14. a kind of storage medium, the storage medium includes the program of storage, wherein described program executes above-mentioned power when running
Benefit requires method described in 1 to 8 any one.
15. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to execute side described in any one of claim 1 to 8 by the computer program
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810961840.2A CN109299439B (en) | 2018-08-22 | 2018-08-22 | Digital extraction method and apparatus, storage medium, and electronic apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810961840.2A CN109299439B (en) | 2018-08-22 | 2018-08-22 | Digital extraction method and apparatus, storage medium, and electronic apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109299439A true CN109299439A (en) | 2019-02-01 |
CN109299439B CN109299439B (en) | 2021-05-11 |
Family
ID=65165415
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810961840.2A Active CN109299439B (en) | 2018-08-22 | 2018-08-22 | Digital extraction method and apparatus, storage medium, and electronic apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109299439B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114330243A (en) * | 2021-12-31 | 2022-04-12 | 北京执象科技发展有限公司 | Method and device for identifying oral calculation result, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101196881A (en) * | 2006-12-08 | 2008-06-11 | 富士通株式会社 | Words symbolization processing method and system for number and special symbol string in text |
US7836061B1 (en) * | 2007-12-29 | 2010-11-16 | Kaspersky Lab, Zao | Method and system for classifying electronic text messages and spam messages |
CN102184167A (en) * | 2011-05-25 | 2011-09-14 | 安徽科大讯飞信息科技股份有限公司 | Method and device for processing text data |
CN102915313A (en) * | 2011-08-05 | 2013-02-06 | 腾讯科技(深圳)有限公司 | Error correction relation generation method and system in web search |
CN107368466A (en) * | 2017-06-27 | 2017-11-21 | 成都准星云学科技有限公司 | A kind of name recognition methods and its system towards elementary mathematics field |
-
2018
- 2018-08-22 CN CN201810961840.2A patent/CN109299439B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101196881A (en) * | 2006-12-08 | 2008-06-11 | 富士通株式会社 | Words symbolization processing method and system for number and special symbol string in text |
US7836061B1 (en) * | 2007-12-29 | 2010-11-16 | Kaspersky Lab, Zao | Method and system for classifying electronic text messages and spam messages |
CN102184167A (en) * | 2011-05-25 | 2011-09-14 | 安徽科大讯飞信息科技股份有限公司 | Method and device for processing text data |
CN102915313A (en) * | 2011-08-05 | 2013-02-06 | 腾讯科技(深圳)有限公司 | Error correction relation generation method and system in web search |
CN107368466A (en) * | 2017-06-27 | 2017-11-21 | 成都准星云学科技有限公司 | A kind of name recognition methods and its system towards elementary mathematics field |
Non-Patent Citations (1)
Title |
---|
李烯: "基于关键词共现的教育信息化工程发展初探", 《电化教育研究》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114330243A (en) * | 2021-12-31 | 2022-04-12 | 北京执象科技发展有限公司 | Method and device for identifying oral calculation result, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109299439B (en) | 2021-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111190939B (en) | User portrait construction method and device | |
CN107204184B (en) | Audio recognition method and system | |
CN108447471A (en) | Audio recognition method and speech recognition equipment | |
CN109766013A (en) | Poetry sentence input recommendation method and device and electronic equipment | |
CN105095415B (en) | The determination method and apparatus of network mood | |
CN106874253A (en) | Recognize the method and device of sensitive information | |
CN103019407B (en) | Input method application method, automatic question answering processing method, electronic equipment and server | |
CN109783624A (en) | Answer generation method, device and the intelligent conversational system in knowledge based library | |
CN108305050A (en) | Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium | |
CN104951807B (en) | The determination method and apparatus of stock market's mood | |
CN111292752A (en) | User intention identification method and device, electronic equipment and storage medium | |
CN109033075A (en) | It is intended to matched method, apparatus, storage medium and terminal device | |
CN107590291A (en) | A kind of searching method of picture, terminal device and storage medium | |
CN107741972A (en) | A kind of searching method of picture, terminal device and storage medium | |
CN109190119B (en) | Time extraction method and device, storage medium and electronic device | |
CN113889074A (en) | Voice generation method, device, equipment and medium | |
CN107330009A (en) | Descriptor disaggregated model creation method, creating device and storage medium | |
CN109597987A (en) | A kind of text restoring method, device and electronic equipment | |
CN111179904A (en) | Mixed text-to-speech conversion method and device, terminal and computer readable storage medium | |
CN113220854B (en) | Intelligent dialogue method and device for machine reading and understanding | |
CN110222103A (en) | Extract method and device, the computer equipment, storage medium of excel data | |
CN110246494A (en) | Service request method, device and computer equipment based on speech recognition | |
CN109299439A (en) | Digital extraction method and apparatus, storage medium and electronic device | |
CN110895555B (en) | Data retrieval method and device, storage medium and electronic device | |
CN116543798A (en) | Emotion recognition method and device based on multiple classifiers, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |