CN109299439A - Digital extraction method and apparatus, storage medium and electronic device - Google Patents

Digital extraction method and apparatus, storage medium and electronic device Download PDF

Info

Publication number
CN109299439A
CN109299439A CN201810961840.2A CN201810961840A CN109299439A CN 109299439 A CN109299439 A CN 109299439A CN 201810961840 A CN201810961840 A CN 201810961840A CN 109299439 A CN109299439 A CN 109299439A
Authority
CN
China
Prior art keywords
instruction
target
participle
extraction
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810961840.2A
Other languages
Chinese (zh)
Other versions
CN109299439B (en
Inventor
包恒耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810961840.2A priority Critical patent/CN109299439B/en
Publication of CN109299439A publication Critical patent/CN109299439A/en
Application granted granted Critical
Publication of CN109299439B publication Critical patent/CN109299439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of digital extraction method and apparatus, storage medium and electronic devices.Wherein, this method comprises: obtaining the instruction text to match with the inquiry instruction of input;Participle mark processing is executed to instruction text, obtains instruction participle set, wherein the instruction of each of instruction participle set is each configured with part of speech label;Determine that target instruction target word segments from instruction participle set according to part of speech label, wherein include effective digital information in target instruction target word participle;According to the positional relationship between target instruction target word participle included in instruction participle set, the target number with effective digital information matches is extracted from instruction text, wherein target number is to allow the number of machine recognition.The present invention solves the technical problem that digital extraction accuracy is low in the related technology.

Description

Digital extraction method and apparatus, storage medium and electronic device
Technical field
The present invention relates to computer fields, in particular to a kind of digital extraction method and apparatus, storage medium and electricity Sub-device.
Background technique
In the instruction that user inputs to hardware device, some digital informations can be usually carried, such as comprising for indicating goods The information of the number word such as coin, time, length, distance.In order to facilitate hardware device to number entrained in above-mentioned digital information Word executes corresponding machine processing operation, generally requires first to extract above-mentioned number from instruction.
Currently, hardware device is after getting the corresponding instruction text of instruction, common extracting mode are as follows: utilize canonical Matching formula carries out simple match to instruction text, to extract number entrained by digital information in instruction text.However, It usually will appear special number in instruction text, such as meaningless Chinese-character digital or Chinese-character digital and Arabic numerals, which are mixed, to be made Digital composite.For above-mentioned special number, if continuing to will lead to number using digital extraction method used by the relevant technologies The low problem of the accuracy that word extracts.
For above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of digital extraction method and apparatus, storage medium and electronic devices, at least to solve The certainly low technical problem of digital extraction accuracy in the related technology.
According to an aspect of an embodiment of the present invention, a kind of digital extraction method is provided, comprising: obtain and look into what is inputted Ask the instruction text that instruction matches;Participle mark processing is executed to above-metioned instruction text, obtains instruction participle set, wherein The instruction of each of above-metioned instruction participle set is each configured with part of speech label;According to above-mentioned part of speech label from above-mentioned finger It enables in participle set and determines that target instruction target word segments, wherein include effective digital information in above-mentioned target instruction target word participle;According to upper The positional relationship between above-mentioned target instruction target word participle included in instruction participle set is stated, is extracted from above-metioned instruction text With the target number of above-mentioned effective digital information matches, wherein above-mentioned target number is to allow the number of machine recognition.
According to another aspect of an embodiment of the present invention, a kind of digital extraction device is additionally provided, comprising: acquiring unit is used In the instruction text that the inquiry instruction for obtaining with inputting matches;Processing unit, for executing participle mark to above-metioned instruction text Note processing obtains instruction participle set, wherein the instruction of each of above-metioned instruction participle set is each configured with part of speech Label;Determination unit, for determining that target instruction target word segments from above-metioned instruction participle set according to above-mentioned part of speech label, In, it include effective digital information in above-mentioned target instruction target word participle;Extraction unit, for being wrapped according in above-metioned instruction participle set The positional relationship between above-mentioned target instruction target word participle contained, extracts and above-mentioned effective digital information from above-metioned instruction text The target number matched, wherein above-mentioned target number is to allow the number of machine recognition.
As a kind of optional example, said extracted unit includes: third extraction module, for having above-mentioned acquisition is above-mentioned After the number format for imitating number entrained in digital information, in the case where above-mentioned number format is Arabic numerals, Number entrained by above-mentioned effective digital information is extracted, as above-mentioned target number.
As a kind of optional example, above-mentioned determination unit includes: that third obtains module, is collected for segmenting from above-metioned instruction In conjunction, the instruction participle that above-mentioned part of speech label is designated as number is obtained, is segmented as above-mentioned target instruction target word, wherein above-mentioned part of speech Label is designated as in the instruction participle of number comprising above-mentioned effective digital information.
As a kind of optional example, above-mentioned acquiring unit includes at least one of: the 4th obtains module, for obtaining The above-mentioned inquiry instruction of voice input;Identify command information entrained in above-mentioned inquiry instruction;It is raw according to above-metioned instruction information At above-metioned instruction text;5th obtains module, for obtaining the above-mentioned inquiry instruction inputted by input equipment;Parse above-mentioned look into Instruction is ask, above-metioned instruction text is obtained.
Another aspect according to an embodiment of the present invention, additionally provides a kind of storage medium, and meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute above-mentioned digital extraction method when operation.
Another aspect according to an embodiment of the present invention, additionally provides a kind of electronic device, including memory, processor and deposits Store up the computer program that can be run on a memory and on a processor, wherein above-mentioned processor passes through computer program and executes Above-mentioned digital extraction method.
In embodiments of the present invention, the instruction text to match using the inquiry instruction for obtaining with inputting;To instruction text This execution segments mark processing, obtains instruction participle set;Wherein, each of instruction participle set instruction participle is matched respectively It is equipped with part of speech label;According to part of speech label from instruction participle set in determine target instruction target word participle according to instruction participle set in The positional relationship between target instruction target word participle for being included, extracts the target with effective digital information matches from instruction text The method of number.In the above-mentioned methods, during extracting target number, due to first having carried out participle mark to instruction text Processing obtains instruction participle set, and is configured with part of speech label to each instruction participle in instruction participle set, so as to When extracting target number, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position Relationship extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves and extracts target number Accuracy.And then solves the technical problem that digital extraction accuracy is low in the related technology.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of schematic diagram of the application environment of digital extraction method according to an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of digital extraction method according to an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram of digital extraction method according to an embodiment of the present invention;
Fig. 4 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 5 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 6 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 7 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 8 is the schematic diagram of another digital extraction method according to an embodiment of the present invention;
Fig. 9 is a kind of structural schematic diagram of digital extraction device according to an embodiment of the present invention;
Figure 10 is a kind of structural schematic diagram of electronic device according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein can be in addition to illustrating herein or retouching Sequence other than those of stating is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that covering Non-exclusive includes, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to clearly Those of list to Chu step or unit, but may include be not clearly listed or for these process, methods, product or The intrinsic other step or units of equipment.
According to an aspect of an embodiment of the present invention, a kind of digital extraction method is provided, optionally, as a kind of optional Embodiment, above-mentioned digital extraction method can be, but not limited to be applied to environment as shown in Figure 1 in.
Human-computer interaction can be carried out between user 102 and user equipment 104.User equipment 104 include memory 106 with Processor 108.The inquiry instruction of the available user of user equipment 104 input, and according to above-mentioned inquiry instruction, obtain with it is above-mentioned The matched instruction text of inquiry instruction.After getting above-metioned instruction text, user equipment 104 is literary by above-metioned instruction by network Originally it is sent to server 112.It include index data base 114 in server 112, participle engine 116 and extraction engine 118.? After server 112 gets above-metioned instruction text, above-metioned instruction text can be stored into index data base 114.Then, make Above-metioned instruction text is segmented with participle engine 116, obtains participle set.Engine 118 is extracted according in participle set Positional relationship between target instruction target word participle, extracts target number.Server 112 returns to target number to user equipment 104.
It should be noted that in the related technology, due to often there is meaningless Chinese character or digital composite in text, from And when obtaining the number in text, accessed result precision is not high.And in the present embodiment, extracting target number In the process, due to first having carried out the processing of participle mark to instruction text, instruction participle set is obtained, and in instruction participle set It is configured with part of speech label to each instruction participle, so as to go out target according to part of speech tag extraction when extracting target number Instruction participle, and according to target instruction target word segment between positional relationship extract target number, so as to target number into The extraction of row precise and high efficiency improves the accuracy for extracting target number.
Optionally, above-mentioned digital extraction method can be, but not limited to be applied to calculate in the terminal of data, such as pen Remember in the terminals such as this computer, PC machine, smart phone, intelligent sound box, smart home, headset equipment, above-mentioned network may include But it is not limited to wireless network or cable network.Wherein, which includes: bluetooth, WIFI and other realization wireless communications Network.Above-mentioned cable network can include but is not limited to: wide area network, Metropolitan Area Network (MAN), local area network.Above-mentioned server may include but not It is limited to any hardware device that can be calculated.
Optionally, as an alternative embodiment, as shown in Fig. 2, above-mentioned digital extraction method includes:
S202 obtains the instruction text to match with the inquiry instruction of input;
S204 executes participle mark processing to instruction text, obtains instruction participle set, wherein in instruction participle set Each instruction be each configured with part of speech label;
S206 determines that target instruction target word segments from instruction participle set according to part of speech label, wherein target instruction target word participle In include effective digital information;
S208, according to the positional relationship between target instruction target word participle included in instruction participle set, from instruction text In extract target number with effective digital information matches, wherein target number is to allow the number of machine recognition.
Optionally, as an alternative embodiment, above-mentioned digital extraction method can be, but not limited to be applied to statistics During financial report, during being perhaps applied to land area assessment or during being applied to census.
Using the above method, during extracting target number, due to first having been carried out at participle mark to instruction text Reason obtains instruction participle set, and in instruction participle set to each instruction participle configured with part of speech label, so as to Extract target number when, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position close System extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves the standard for extracting target number True property.
Optionally, the instruction text that the inquiry instruction of above-mentioned acquisition and input matches can be, but not limited to by with lower section Formula:
(1) input frame is shown on the display interface of terminal, when receiving the content of input frame input, will received Above content as above-metioned instruction text.
For example, showing input frame on the display interface of terminal, receives in input frame and have input " occupied area 1000 After the printed words of mu ", above-mentioned " 1000 mu of occupied area " is used as instruction text.
(2) picture for carrying instruction text is received, text information is identified from above-mentioned picture, and will identify that Text information is as instruction text.
For example, terminal receives the picture for carrying " 1000 mu of occupied area " printed words, first picture is identified, is identified The printed words of " 1000 mu of occupied area " out, and above-mentioned text is acquired, by collected " 1000 mu of occupied area " as instruction text This.
(3) after receiving selected instruction, using chosen text as instruction text.
Optionally, above-mentioned to receive the button that selected instruction can be, but not limited to the display interface for terminal and be pressed, it connects Receive the phonetic order etc. of user's input.
For example, a button and word content can be shown on the display interface of terminal.Receiving above-mentioned button quilt When pressing, the word content that user is selected executes subsequent digital extraction process as instruction text.
(4) voice input information is got, the voice that will acquire enters information as instruction text.
For example, the voice for receiving user's input inputs information, such as " 1000 mu of occupied area ", then by the above-mentioned language got Message breath is converted to text information, and as instruction text.
Optionally, above-mentioned that the instruction text to will acquire can be, but not limited to instruction text execution participle mark processing Multiple individual fields are split into, and add part of speech label for each field.
Optionally, above-mentioned is that each field addition part of speech label can be, but not limited to word to judge each field Property.In the case where above-mentioned part of speech is noun part-of-speech, noun part-of-speech label is added for above-mentioned field;It is number word in above-mentioned part of speech Property in the case where, for above-mentioned field add number part of speech label;It is above-mentioned field in the case where above-mentioned part of speech is verb part of speech Add verb part of speech label;In the case where above-mentioned part of speech is adjective part of speech, adjective part of speech label is added for above-mentioned field; In the case where above-mentioned part of speech is adverbial word part of speech, adverbial word part of speech label is added for above-mentioned field.In the feelings that above-mentioned part of speech is character Under condition, alphanumeric tag is added for above-mentioned field.
Optionally, each field can correspond to one or more part of speech label.
It optionally, is that " registration examinee 1,000 ten thousand enrolls examinee 2,000,000 with above-mentioned instruction text." for, Fig. 3 is that one kind can The word segmentation result of energy.After getting above-metioned instruction text, above-metioned instruction text is segmented, obtain " registering ", " examinee ", " 1 ", " necessarily ", ", ", " admission ", " examining ", " life ", " 200 ", " ten thousand ", "." etc. multiple fields.Wherein, ten million includes two words Property label, be number or adverbial word, life equally exist two part of speech labels, be verb or adjective.For above-mentioned each field Part of speech label is added, so as to distinguish to above-mentioned multiple fields.
Optionally, above-mentioned to determine that target instruction target word participle can be, but not limited to from instruction participle set according to part of speech label Are as follows: from instruction participle set, the instruction participle that part of speech label is designated as number is obtained, is segmented as target instruction target word, wherein word Property label be designated as in the instruction participle of number comprising digital information.Wherein, the digital information include effect digital information with Nonsignificant digit information, above-mentioned effective digital information is the number with mathematical meaning, such as indicates the number of quantity, such as 1,000,000, The numbers such as 7000;Or the number of the expression of years, " 1998 " in such as " 1998 ", or the number for indicating distance, such as " 50 " in 50 kilometers etc..Above-mentioned nonsignificant digit information is used to indicate the number of not mathematical meaning.Such as " agitated " In " seven " and " eight ", there is no mathematical meanings.
Optionally, above-mentioned acquisition part of speech label be designated as number instruction participle, as target instruction target word participle can with but not It is limited in the case where at least one part of speech label is number part of speech, obtains instruction participle corresponding with the number part of speech, as Target instruction target word participle.
For example, being that " registration examinee 1,000 ten thousand enrolls examinee 2,000,000 continuing with above-metioned instruction text." the case where said It is bright.After participle as shown in Figure 3, available " 1 ", " necessarily ", " 200 ", " ten thousand " four instructions are segmented, by above-mentioned four A instruction participle segments as target instruction target word and obtains target number from above-mentioned target instruction target word participle.
Optionally, the positional relationship between the target instruction target word participle according to included in instruction participle set, from instruction text Extract in this with the target number of effective digital information matches include: included in acquisition instruction participle set all effectively The number format of entrained number in digital information;In the case where number format includes Chinese-character digital, segmented according to instruction Positional relationship between the participle of target instruction target word included in set, determines the extraction mode of Chinese-character digital;According to the mode of extraction Extract target number.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " with instruction text.Above-metioned instruction is literary This participle and after marking part of speech, " contingency " is meaningless word, does not need to extract, and " ten thousand " after 1000 are significant word, It needs to extract.Therefore, it is necessary to the positional relationships between being segmented according to target instruction target word, determine the extraction mode of above-mentioned Chinese-character digital.
Optionally, extracting from instruction text can be using such as lower section with the target number of effective digital information matches Method:
(1) position in instruction participle set where at least two target instruction target words participle is continuous position, and at least two In the case that the data type of effective digital information included in a target instruction target word participle is integer type, at least two are determined The extraction mode of a target instruction target word participle is combination extraction mode;According to combination extraction mode, at least two target instruction target words are combined Participle, obtains combined command field;Extract the target number to match with combined command field.
For example, continuing with above-mentioned instruction text as " if breaking down, loss will be more than 10,000,000 yuan ".To above-mentioned After instruction text is segmented and marks part of speech, " 1000 " are detected, the position of " ten thousand " two target instruction target words participle is continuous position It sets, then above-mentioned " 1000 ", " ten thousand " group is combined into " 10,000,000 ", and extracted, obtain target number.
(2) in the case that the position in instruction participle set where target instruction target word participle is discrete location, determination is mentioned Modulus formula is discrete extraction mode;According to discrete extraction mode, extracting target instruction target word participle in instruction participle set respectively includes Effective digital information entrained by number, as target number.
(3) in the case where number format is Arabic numerals, number entrained by effective digital information is extracted, as Target number.
Optionally, the positional relationship between the target instruction target word participle according to included in instruction participle set, determines the Chinese Before the extraction mode of characters/numerals, further includes: the first crucial participle and the second crucial participle in instruction participle set are obtained, Wherein, the first crucial participle is adjacent with target instruction target word participle and is located at before target instruction target word participle, the second crucial participle and target Instruction participle is adjacent and is located at after target instruction target word participle;By the first crucial participle, target instruction target word participle and the second crucial participle Combination obtains candidate field;Digital composite template is called to be compared with candidate field;In candidate field and digital composite template In matched situation, according to digital composite template extraction target number.
Optionally, the above-mentioned digital template that meets can be, but not limited to as score template, percentage template, decimal template, bear Digital-to-analogue plate etc..
Overall description is carried out to above-mentioned digital extraction method below.As shown in figure 4, Fig. 4 is a kind of the aobvious of optional terminal Show the schematic diagram at interface.There are two button, a button is the input for inputting instruction text for display on the display interface of terminal Acquisition voice is inputted information, and be converted to text information and show after input button is pressed by button.Another button is Button is extracted, after extracting button and being pressed, collected voice can be entered information as instruction text, and extracts instruction text Target number in this.Optionally, in acquisition instruction text, it can receive selection instruction.The voice of selection is inputted into information As instruction text.As shown in figure 5, the voice input information in Fig. 5 with underscore is the voice input information selected.? After detecting that extracting button is pressed, " profit increases by 10 on a year-on-year basis more than 5,000,000 " is used as instruction text.
After getting instruction text, above-metioned instruction text is segmented and is marked, obtain " 500 ", " ten thousand ", " hundred ", " ten " digital information, due to " 500 ", " ten thousand " two digital information bits set it is connected, " 500 ", " ten thousand " group are combined into " 500 It ten thousand " and extracts, is saved as target number.Or it after " 5,000,000 " are extracted as target number, are converted to " 5000000 " and protect It deposits.And 10 are similarly the target number for needing to extract.Therefore, by 10 and pre-set digital composite template It is compared.It presets and meets digital template for " * */* * ".After being compared with above-mentioned 10, above-mentioned hundred are extracted / ten are used as target number.Optionally, it after extracting above-mentioned 10, can be, but not limited to format, Such as it is converted into decimal 0.1 and is saved.
Through this embodiment, during extracting target number, due to first having been carried out at participle mark to instruction text Reason obtains instruction participle set, and in instruction participle set to each instruction participle configured with part of speech label, so as to Extract target number when, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position close System extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves the standard for extracting target number True property.
As a kind of optional embodiment, according to the position between target instruction target word participle included in instruction participle set Relationship is set, is extracted from instruction text and includes: with the target number of effective digital information matches
S1, acquisition instruction segment the digital lattice of number entrained in whole effective digital information included in set Formula;
S2, in the case where number format includes Chinese-character digital, according to target instruction target word included in instruction participle set Positional relationship between participle determines the extraction mode of Chinese-character digital;
S3 goes out target number according to schema extraction is extracted.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " with instruction text, as shown in fig. 6, Fig. 6 For a kind of possible the case where above-metioned instruction text is segmented and marked.As it can be seen that " ten thousand ", " one ", " 1000 ", " ten thousand " are marked For number.And therefore " contingency " not meaningful number at the extraction, does not extract " contingency ".
Through this embodiment, in the case where including Chinese-character digital in instruction participle set, according in instruction participle set Positional relationship between target instruction target word participle, determines the extraction mode of Chinese-character digital, to improve the spirit for extracting target number Activity and accuracy.
As a kind of optional embodiment,
S1 determines Chinese-character digital according to the positional relationship between target instruction target word participle included in instruction participle set Extraction mode include: position in instruction participle set where at least two target instruction target words participle be continuous position, and extremely In the case that the data type of effective digital information included in few two target instruction target words participle is integer type, determine extremely The extraction mode of few two target instruction target words participle is combination extraction mode;
S2 includes: according to combination extraction mode according to extracting schema extraction to go out target number, and at least two targets of combination refer to Participle is enabled, combined command field is obtained;Extract the target number to match with combined command field.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " continuing with above-metioned instruction text, right After above-metioned instruction text is segmented and marked, as shown in fig. 7, extracted to " 1000 ", " ten thousand ", and it will extract " 1000 ", " ten thousand " are combined, and combination obtains " 10,000,000 ", and " 10,000,000 " are used as target number.
Through this embodiment, by extracting the target number in instruction participle set according to combination extraction mode, so as to Accurately and efficiently to extract the target number in instruction participle combination according to the actual situation, the effect for extracting target number is improved Rate.
As a kind of optional embodiment,
S1 determines Chinese-character digital according to the positional relationship between target instruction target word participle included in instruction participle set Extraction mode include: instruction participle set in target instruction target word participle where position be discrete location in the case where, really Surely extracting mode is discrete extraction mode;
S2 includes: to extract instruction participle collection respectively according to discrete extraction mode according to extracting schema extraction to go out target number Number entrained by the effective digital information that target instruction target word participle includes in conjunction, as target number.
For example, being this for " 2 points 3 yuan of profit " with instruction.Due to " 20 ", " 30,000 " in above-metioned instruction text Between and it is discontinuous, therefore, it is necessary to extract with discrete extraction mode, obtain target data " 20.3 ten thousand ".
Through this embodiment, by extracting the target number in instruction participle set according to discrete extraction mode, so as to Accurately and efficiently to extract the target number in instruction participle combination according to the actual situation, the effect for extracting target number is improved Rate.
As a kind of optional embodiment, between the target instruction target word participle according to included in instruction participle set Positional relationship, before the extraction mode for determining Chinese-character digital, further includes:
S1 obtains the first crucial participle and the second crucial participle in instruction participle set, wherein the first crucial participle Adjacent with target instruction target word participle and be located at before target instruction target word participle, the second crucial participle segments adjacent and is located at target instruction target word After target instruction target word participle;
First crucial participle, target instruction target word participle and the second crucial participle combination are obtained candidate field by S2;
S3 calls digital composite template to be compared with candidate field;
S4, in the case where candidate field and digital composite template matching, according to digital composite template extraction target number.
Optionally, above-mentioned first keyword and the second keyword can be, but not limited to as some significant words.For example, can be with For for indicating the word of the number such as score, decimal, negative.
For example, Fig. 8 is several optional composite number type matrixes so that instruction text is " eighth commodity are sold " as an example The schematic diagram of plate.1/8th are met after digital template matched with above-mentioned, uses the template extraction of " * */* * " Target number obtains 1/8th.After obtaining above-mentioned 1/8th, extended formatting can be converted to by above-mentioned 1/8th.
Through this embodiment, meet digital template extraction target number by calling, so as to extract complex situations Digital information, improve extract target number flexibility and accuracy.
As a kind of optional embodiment, in obtaining effective digital information the number format of entrained number it Afterwards, further includes:
S1 extracts number entrained by effective digital information in the case where number format is Arabic numerals, as Target number.
For example, with instruction text be " 23 degree of north latitude, 67 degree of east longitude " for, after getting above-metioned instruction text, due to Number format in above-metioned instruction text is Arabic numerals, therefore, can be directly to the Arab in above-metioned instruction text Number extracts, and obtains target number.
Through this embodiment, by directly extracting Arabic numerals, so as to be Arabic numerals in number format In the case of, target number is accurately and efficiently extracted, the efficiency for extracting target number is improved.
As a kind of optional embodiment, determine that target instruction target word segments from instruction participle set according to part of speech label Include:
S1 obtains the instruction participle that part of speech label is designated as number, as target instruction target word point from instruction participle set Word, wherein part of speech label is designated as in the instruction participle of number comprising effective digital information.
For example, the case where being " 23 degree of north latitude, 67 degree of east longitude " continuing with above-metioned instruction text, is illustrated.To above-mentioned After " 23 degree of north latitude, 67 degree of east longitude " are segmented and marked, " 23 " are obtained, the part of speech of " 67 " is number.Then it is by above-mentioned part of speech " 23 ", " 67 " of number extract, as target number.
Through this embodiment, it is segmented by obtaining the instruction that part of speech label is number from instruction participle set, as mesh Mark instruction participle improves so as to extract target instruction target word participle from instruction participle set and obtains target instruction target word participle Flexibility.
As a kind of optional embodiment, obtain the instruction text to match with the inquiry instruction of input include with down toward It is one of few:
(1) inquiry instruction of voice input is obtained;Identify command information entrained in inquiry instruction;According to command information Generate instruction text;
(2) inquiry instruction inputted by input equipment is obtained;Inquiry instruction is parsed, instruction text is obtained.
For example, as shown in figure 4, Fig. 4 is a kind of schematic diagram of the display interface of optional terminal.In the display interface of terminal There are two button, a button is that the input button of input instruction text will acquire language after input button is pressed for upper display Sound inputs information, and is converted to text information and shows.Another button is to extract button, after extracting button and being pressed, can be incited somebody to action Collected voice enters information as instruction text, and extracts the target number in instruction text.Optionally, in acquisition instruction When text, selection instruction can receive.The voice of selection is entered information as into instruction text.As shown in figure 5, being had in Fig. 5 The voice input information of underscore is the voice input information selected.After detecting that extracting button is pressed, " it will get a profit super 5,000,000 are crossed, increases by 10 on a year-on-year basis " as instruction text.
Through this embodiment, by one of any acquisition instruction text of the above method, to improve acquisition instruction text Flexibility.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.
Other side according to an embodiment of the present invention additionally provides a kind of for implementing the number of above-mentioned digital extraction method Word extraction element.As shown in figure 9, the device includes:
(1) acquiring unit 902, the instruction text that the inquiry instruction for obtaining and inputting matches;
(2) processing unit 904 obtain instruction participle set for executing participle mark processing to instruction text, wherein The instruction of each of instruction participle set is each configured with part of speech label;
(3) determination unit 906, for determining that target instruction target word segments from instruction participle set according to part of speech label, In, it include effective digital information in target instruction target word participle;
(4) extraction unit 908 are closed for the position between the target instruction target word participle according to included in instruction participle set System, extracts the target number with effective digital information matches from instruction text, wherein target number is to allow machine recognition Number.
Optionally, above-mentioned digital extraction device can be, but not limited to during being applied to statistics financial report, Huo Zheying During for land area assessment, or applied to during census.
Optionally, above-mentioned digital extraction device can be, but not limited to be applied on intelligent terminal, for example, being applied to mobile phone On.
Using above-mentioned apparatus, during extracting target number, due to first having been carried out at participle mark to instruction text Reason obtains instruction participle set, and in instruction participle set to each instruction participle configured with part of speech label, so as to Extract target number when, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position close System extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves the standard for extracting target number True property.
Optionally, the instruction text that the inquiry instruction of above-mentioned acquisition and input matches can be, but not limited to by with lower section Formula:
(1) input frame is shown on the display interface of terminal, when receiving the content of input frame input, will received Above content as above-metioned instruction text.
For example, showing input frame on the display interface of terminal, receives in input frame and have input " occupied area 1000 After the printed words of mu ", above-mentioned " 1000 mu of occupied area " is used as instruction text.
(2) picture for carrying instruction text is received, text information is identified from above-mentioned picture, and will identify that Text information is as instruction text.
For example, terminal receives the picture for carrying " 1000 mu of occupied area " printed words, first picture is identified, is identified The printed words of " 1000 mu of occupied area " out, and above-mentioned text is acquired, by collected " 1000 mu of occupied area " as instruction text This.
(3) after receiving selected instruction, using chosen text as instruction text.
Optionally, above-mentioned to receive the button that selected instruction can be, but not limited to the display interface for terminal and be pressed, it connects Receive the phonetic order etc. of user's input.
For example, a button and word content can be shown on the display interface of terminal.Receiving above-mentioned button quilt When pressing, the word content that user is selected executes subsequent digital extraction process as instruction text.
(4) voice input information is got, the voice that will acquire enters information as instruction text.
For example, the voice for receiving user's input inputs information, such as " 1000 mu of occupied area ", then by the above-mentioned language got Message breath is converted to text information, and as instruction text.
Optionally, above-mentioned that the instruction text to will acquire can be, but not limited to instruction text execution participle mark processing Multiple individual fields are split into, and add part of speech label for each field.
Optionally, above-mentioned is that each field addition part of speech label can be, but not limited to word to judge each field Property.In the case where above-mentioned part of speech is noun part-of-speech, noun part-of-speech label is added for above-mentioned field;It is number word in above-mentioned part of speech Property in the case where, for above-mentioned field add number part of speech label;It is above-mentioned field in the case where above-mentioned part of speech is verb part of speech Add verb part of speech label;In the case where above-mentioned part of speech is adjective part of speech, adjective part of speech label is added for above-mentioned field; In the case where above-mentioned part of speech is adverbial word part of speech, adverbial word part of speech label is added for above-mentioned field.In the feelings that above-mentioned part of speech is character Under condition, alphanumeric tag is added for above-mentioned field.
Optionally, each field can correspond to one or more part of speech label.
It optionally, is that " registration examinee 1,000 ten thousand enrolls examinee 2,000,000 with above-mentioned instruction text." for, Fig. 3 is that one kind can The word segmentation result of energy.After getting above-metioned instruction text, above-metioned instruction text is segmented, obtain " registering ", " examinee ", " 1 ", " necessarily ", ", ", " admission ", " examining ", " life ", " 200 ", " ten thousand ", "." etc. multiple fields.Wherein, ten million includes two words Property label, be number or adverbial word, life equally exist two part of speech labels, be verb or adjective.For above-mentioned each field Part of speech label is added, so as to distinguish to above-mentioned multiple fields.
Optionally, above-mentioned to determine that target instruction target word participle can be, but not limited to from instruction participle set according to part of speech label Are as follows: from instruction participle set, the instruction participle that part of speech label is designated as number is obtained, is segmented as target instruction target word, wherein word Property label be designated as in the instruction participle of number comprising digital information.Wherein, the digital information include effect digital information with Nonsignificant digit information, above-mentioned effective digital information is the number with mathematical meaning, such as indicates the number of quantity, such as 1,000,000, The numbers such as 7000;Or the number of the expression of years, " 1998 " in such as " 1998 ", or the number for indicating distance, such as " 50 " in 50 kilometers etc..Above-mentioned nonsignificant digit information is used to indicate the number of not mathematical meaning.Such as " agitated " In " seven " and " eight ", there is no mathematical meanings.
Optionally, above-mentioned acquisition part of speech label be designated as number instruction participle, as target instruction target word participle can with but not It is limited in the case where at least one part of speech label is number part of speech, obtains instruction participle corresponding with the number part of speech, as Target instruction target word participle.
For example, being that " registration examinee 1,000 ten thousand enrolls examinee 2,000,000 continuing with above-metioned instruction text." the case where said It is bright.After participle as shown in Figure 3, available " 1 ", " necessarily ", " 200 ", " ten thousand " four instructions are segmented, by above-mentioned four A instruction participle segments as target instruction target word and obtains target number from above-mentioned target instruction target word participle.
Optionally, the positional relationship between the target instruction target word participle according to included in instruction participle set, from instruction text Extract in this with the target number of effective digital information matches include: included in acquisition instruction participle set all effectively The number format of entrained number in digital information;In the case where number format includes Chinese-character digital, segmented according to instruction Positional relationship between the participle of target instruction target word included in set, determines the extraction mode of Chinese-character digital;According to the mode of extraction Extract target number.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " with instruction text.Above-metioned instruction is literary This participle and after marking part of speech, " contingency " is meaningless word, does not need to extract, and " ten thousand " after 1000 are significant word, It needs to extract.Therefore, it is necessary to the positional relationships between being segmented according to target instruction target word, determine the extraction mode of above-mentioned Chinese-character digital.
Optionally, extracting from instruction text can be using such as lower section with the target number of effective digital information matches Method:
(1) position in instruction participle set where at least two target instruction target words participle is continuous position, and at least two In the case that the data type of effective digital information included in a target instruction target word participle is integer type, at least two are determined The extraction mode of a target instruction target word participle is combination extraction mode;According to combination extraction mode, at least two target instruction target words are combined Participle, obtains combined command field;Extract the target number to match with combined command field.
For example, continuing with above-mentioned instruction text as " if breaking down, loss will be more than 10,000,000 yuan ".To above-mentioned After instruction text is segmented and marks part of speech, " 1000 " are detected, the position of " ten thousand " two target instruction target words participle is continuous position It sets, then above-mentioned " 1000 ", " ten thousand " group is combined into " 10,000,000 ", and extracted, obtain target number.
(2) in the case that the position in instruction participle set where target instruction target word participle is discrete location, determination is mentioned Modulus formula is discrete extraction mode;According to discrete extraction mode, extracting target instruction target word participle in instruction participle set respectively includes Effective digital information entrained by number, as target number.
(3) in the case where number format is Arabic numerals, number entrained by effective digital information is extracted, as Target number.
Optionally, the positional relationship between the target instruction target word participle according to included in instruction participle set, determines the Chinese Before the extraction mode of characters/numerals, further includes: the first crucial participle and the second crucial participle in instruction participle set are obtained, Wherein, the first crucial participle is adjacent with target instruction target word participle and is located at before target instruction target word participle, the second crucial participle and target Instruction participle is adjacent and is located at after target instruction target word participle;By the first crucial participle, target instruction target word participle and the second crucial participle Combination obtains candidate field;Digital composite template is called to be compared with candidate field;In candidate field and digital composite template In matched situation, according to digital composite template extraction target number.
Optionally, the above-mentioned digital template that meets can be, but not limited to as score template, percentage template, decimal template, bear Digital-to-analogue plate etc..
Overall description is carried out to above-mentioned digital extraction method below.As shown in figure 4, Fig. 4 is a kind of the aobvious of optional terminal Show the schematic diagram at interface.There are two button, a button is the input for inputting instruction text for display on the display interface of terminal Acquisition voice is inputted information, and be converted to text information and show after input button is pressed by button.Another button is Button is extracted, after extracting button and being pressed, collected voice can be entered information as instruction text, and extracts instruction text Target number in this.Optionally, in acquisition instruction text, it can receive selection instruction.The voice of selection is inputted into information As instruction text.As shown in figure 5, the voice input information in Fig. 5 with underscore is the voice input information selected.? After detecting that extracting button is pressed, " profit increases by 10 on a year-on-year basis more than 5,000,000 " is used as instruction text.
After getting instruction text, above-metioned instruction text is segmented and is marked, obtain " 500 ", " ten thousand ", " hundred ", " ten " digital information, due to " 500 ", " ten thousand " two digital information bits set it is connected, " 500 ", " ten thousand " group are combined into " 500 It ten thousand " and extracts, is saved as target number.Or it after " 5,000,000 " are extracted as target number, are converted to " 5000000 " and protect It deposits.And 10 are similarly the target number for needing to extract.Therefore, by 10 and pre-set digital composite template It is compared.It presets and meets digital template for " * */* * ".After being compared with above-mentioned 10, above-mentioned hundred are extracted / ten are used as target number.Optionally, it after extracting above-mentioned 10, can be, but not limited to format, Such as it is converted into decimal 0.1 and is saved.
Through this embodiment, during extracting target number, due to first having been carried out at participle mark to instruction text Reason obtains instruction participle set, and in instruction participle set to each instruction participle configured with part of speech label, so as to Extract target number when, according to part of speech tag extraction go out target instruction target word participle, and according to target instruction target word segment between position close System extracts target number, so as to carry out the extraction of precise and high efficiency to target number, improves the standard for extracting target number True property.
As a kind of optional embodiment, said extracted unit includes:
(1) first obtains module, for being taken in whole effective digital information included in acquisition instruction participle set The number format of the number of band;
(2) determining module, for being wrapped according in instruction participle set in the case where number format includes Chinese-character digital The positional relationship between target instruction target word participle contained, determines the extraction mode of Chinese-character digital;
(3) first extraction modules, for going out target number according to extraction schema extraction.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " with instruction text, as shown in fig. 6, Fig. 6 For a kind of possible the case where above-metioned instruction text is segmented and marked.As it can be seen that " ten thousand ", " one ", " 1000 ", " ten thousand " are marked For number.And therefore " contingency " not meaningful number at the extraction, does not extract " contingency ".
Through this embodiment, in the case where including Chinese-character digital in instruction participle set, according in instruction participle set Positional relationship between target instruction target word participle, determines the extraction mode of Chinese-character digital, to improve the spirit for extracting target number Activity and accuracy.
As a kind of optional embodiment,
(1) above-mentioned determining module includes: the first determining submodule, is referred to for segmenting at least two targets in set in instruction Enable the position where segmenting for continuous position, and the data of effective digital information included at least two target instruction target words participle In the case that type is integer type, determine the extraction mode of at least two target instruction target words participle for combination extraction mode;
(2) above-mentioned first extraction module includes: the first extracting sub-module, for according to combination extraction mode, combination to be at least Two target instruction target word participles, obtain combined command field;Extract the target number to match with combined command field.
For example, for being " just in case break down, loss will be more than 10,000,000 yuan " continuing with above-metioned instruction text, right After above-metioned instruction text is segmented and marked, as shown in fig. 7, extracted to " 1000 ", " ten thousand ", and it will extract " 1000 ", " ten thousand " are combined, and combination obtains " 10,000,000 ", and " 10,000,000 " are used as target number.
Through this embodiment, by extracting the target number in instruction participle set according to combination extraction mode, so as to Accurately and efficiently to extract the target number in instruction participle combination according to the actual situation, the effect for extracting target number is improved Rate.
As a kind of optional embodiment,
(1) above-mentioned determining module includes: the second determining submodule, for the target instruction target word participle institute in instruction participle set Position be discrete location in the case where, determine extraction mode be discrete extraction mode;
(2) above-mentioned first extraction module includes: the second extracting sub-module, for being extracted respectively according to discrete extraction mode Number entrained by the effective digital information that target instruction target word participle includes in instruction participle set, as target number.
For example, being this for " 2 points 3 yuan of profit " with instruction.Due to " 20 ", " 30,000 " in above-metioned instruction text Between and it is discontinuous, therefore, it is necessary to extract with discrete extraction mode, obtain target data " 20.3 ten thousand ".
Through this embodiment, by extracting the target number in instruction participle set according to discrete extraction mode, so as to Accurately and efficiently to extract the target number in instruction participle combination according to the actual situation, the effect for extracting target number is improved Rate.
As a kind of optional embodiment, said extracted unit further include:
(1) second obtains module, for the position between the target instruction target word participle according to included in instruction participle set Relationship is set, before the extraction mode for determining Chinese-character digital, obtains and is instructing the in participle set first crucial participle and the second pass Key participle, wherein the first crucial participle is adjacent with target instruction target word participle and is located at before target instruction target word participle, the second crucial participle It segments adjacent with target instruction target word and is located at after target instruction target word participle;
(2) composite module, for the first crucial participle, target instruction target word participle and the second crucial participle combination to be obtained candidate Field;
(3) comparison module, for calling digital composite template to be compared with candidate field;
(4) second extraction modules are used in the case where candidate field and digital composite template matching, according to digital composite Template extraction target number.
Optionally, above-mentioned first keyword and the second keyword can be, but not limited to as some significant words.For example, can be with For for indicating the word of the number such as score, decimal, negative.
For example, Fig. 8 is several optional composite number type matrixes so that instruction text is " eighth commodity are sold " as an example The schematic diagram of plate.1/8th are met after digital template matched with above-mentioned, uses the template extraction of " * */* * " Target number obtains 1/8th.After obtaining above-mentioned 1/8th, extended formatting can be converted to by above-mentioned 1/8th.
Through this embodiment, meet digital template extraction target number by calling, so as to extract complex situations Digital information, improve extract target number flexibility and accuracy.
As a kind of optional embodiment, said extracted unit further include:
(1) third extraction module, after entrained digital number format in obtaining effective digital information, In the case that number format is Arabic numerals, number entrained by effective digital information is extracted, as target number.
For example, with instruction text be " 23 degree of north latitude, 67 degree of east longitude " for, after getting above-metioned instruction text, due to Number format in above-metioned instruction text is Arabic numerals, therefore, can be directly to the Arab in above-metioned instruction text Number extracts, and obtains target number.
Through this embodiment, by directly extracting Arabic numerals, so as to be Arabic numerals in number format In the case of, target number is accurately and efficiently extracted, the efficiency for extracting target number is improved.
As a kind of optional embodiment, above-mentioned determination unit includes:
(1) third obtains module, for obtaining the instruction point that part of speech label is designated as number from instruction participle set Word is segmented as target instruction target word, wherein part of speech label is designated as in the instruction participle of number comprising effective digital information.
For example, the case where being " 23 degree of north latitude, 67 degree of east longitude " continuing with above-metioned instruction text, is illustrated.To above-mentioned After " 23 degree of north latitude, 67 degree of east longitude " are segmented and marked, " 23 " are obtained, the part of speech of " 67 " is number.Then it is by above-mentioned part of speech " 23 ", " 67 " of number extract, as target number.
Through this embodiment, it is segmented by obtaining the instruction that part of speech label is number from instruction participle set, as mesh Mark instruction participle improves so as to extract target instruction target word participle from instruction participle set and obtains target instruction target word participle Flexibility.
As a kind of optional embodiment, above-mentioned acquiring unit includes at least one of:
(1) the 4th obtains module, for obtaining the inquiry instruction of voice input;Identify instruction entrained in inquiry instruction Information;Instruction text is generated according to command information;
(2) the 5th obtain module, for obtaining the inquiry instruction inputted by input equipment;Inquiry instruction is parsed, is obtained Instruction text.
For example, as shown in figure 4, Fig. 4 is a kind of schematic diagram of the display interface of optional terminal.In the display interface of terminal There are two button, a button is that the input button of input instruction text will acquire language after input button is pressed for upper display Sound inputs information, and is converted to text information and shows.Another button is to extract button, after extracting button and being pressed, can be incited somebody to action Collected voice enters information as instruction text, and extracts the target number in instruction text.Optionally, in acquisition instruction When text, selection instruction can receive.The voice of selection is entered information as into instruction text.As shown in figure 5, being had in Fig. 5 The voice input information of underscore is the voice input information selected.After detecting that extracting button is pressed, " it will get a profit super 5,000,000 are crossed, increases by 10 on a year-on-year basis " as instruction text.
Through this embodiment, by one of any acquisition instruction text of the above method, to improve acquisition instruction text Flexibility.
Another aspect according to an embodiment of the present invention additionally provides a kind of for implementing the electricity of above-mentioned digital extraction method Sub-device, as shown in Figure 10, the electronic device include memory 1002 and processor 1004, are stored with meter in the memory 1002 Calculation machine program, the processor 1004 are arranged to execute the step in any of the above-described embodiment of the method by computer program.
Optionally, in the present embodiment, above-mentioned electronic device can be located in multiple network equipments of computer network At least one network equipment.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 obtains the instruction text to match with the inquiry instruction of input;
S2 executes participle mark processing to instruction text, obtains instruction participle set, wherein in instruction participle set Each instruction is each configured with part of speech label;
S3 determines that target instruction target word segments from instruction participle set according to part of speech label, wherein in target instruction target word participle Include effective digital information;
S4, according to the positional relationship between target instruction target word participle included in instruction participle set, from instruction text Extract the target number with effective digital information matches, wherein target number is to allow the number of machine recognition.
Optionally, it will appreciated by the skilled person that structure shown in Fig. 10 is only to illustrate, electronic device can also To be smart phone (such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device The terminal devices such as (Mobile Internet Devices, MID), PAD.Figure 10 it does not make to the structure of above-mentioned electronic device At restriction.For example, electronic device may also include than shown in Figure 10 more perhaps less component (such as network interface) or With the configuration different from shown in Figure 10.
Wherein, memory 1002 can be used for storing software program and module, such as the digital extraction in the embodiment of the present invention Corresponding program instruction/the module of method and apparatus, the software program that processor 1004 is stored in memory 1002 by operation And module realizes above-mentioned digital extraction method thereby executing various function application and data processing.Memory 1002 It may include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic storage device dodges It deposits or other non-volatile solid state memories.In some instances, memory 1002 can further comprise relative to processor 1004 remotely located memories, these remote memories can pass through network connection to terminal.The example of above-mentioned network includes But be not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.Wherein, memory 1002 specifically can with but It is not limited to use in the information such as store instruction text, target data.As an example, as shown in Figure 10, above-mentioned memory 1002 In can be, but not limited to include acquiring unit 902 in above-mentioned digital extraction device, processing unit 904, determination unit 906 and mention Take unit 908.In addition, it can include but other modular units for being not limited in above-mentioned digital extraction device, in this example not It repeats again.
Optionally, above-mentioned transmitting device 1006 is used to that data to be received or sent via a network.Above-mentioned network Specific example may include cable network and wireless network.In an example, transmitting device 1006 includes a network adapter (Network Interface Controller, NIC), can be connected by cable with other network equipments with router to It can be communicated with internet or local area network.In an example, transmitting device 1006 be radio frequency (Radio Frequency, RF) module is used to wirelessly be communicated with internet.
In addition, above-mentioned electronic device further include: display 1008, for contents such as displaying target numbers;With connection bus 1010, for connecting the modules component in above-mentioned electronic device.
The another aspect of embodiment according to the present invention, additionally provides a kind of storage medium, is stored in the storage medium Computer program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1 obtains the instruction text to match with the inquiry instruction of input;
S2 executes participle mark processing to instruction text, obtains instruction participle set, wherein in instruction participle set Each instruction is each configured with part of speech label;
S3 determines that target instruction target word segments from instruction participle set according to part of speech label, wherein in target instruction target word participle Include effective digital information;
S4, according to the positional relationship between target instruction target word participle included in instruction participle set, from instruction text Extract the target number with effective digital information matches, wherein target number is to allow the number of machine recognition.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1, acquisition instruction segment the digital lattice of number entrained in whole effective digital information included in set Formula;
S2, in the case where number format includes Chinese-character digital, according to target instruction target word included in instruction participle set Positional relationship between participle determines the extraction mode of Chinese-character digital;
S3 goes out target number according to schema extraction is extracted.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1, the position in instruction participle set where at least two target instruction target words participle are continuous position, and at least two In the case that the data type of effective digital information included in a target instruction target word participle is integer type, at least two are determined The extraction mode of a target instruction target word participle is combination extraction mode;
S2, according to combination extraction mode, combination at least two target instruction target words participle obtains combined command field;Extract with The target number that combined command field matches.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1, in the case that the position in instruction participle set where target instruction target word participle is discrete location, determination is mentioned Modulus formula is discrete extraction mode;
S2 extracts target instruction target word in instruction participle set respectively and segments the effective digital for including according to discrete extraction mode Number entrained by information, as target number.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1 obtains the first crucial participle and the second crucial participle in instruction participle set, wherein the first crucial participle Adjacent with target instruction target word participle and be located at before target instruction target word participle, the second crucial participle segments adjacent and is located at target instruction target word After target instruction target word participle;
First crucial participle, target instruction target word participle and the second crucial participle combination are obtained candidate field by S2;
S3 calls digital composite template to be compared with candidate field;
S4, in the case where candidate field and digital composite template matching, according to digital composite template extraction target number.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1 extracts number entrained by effective digital information in the case where number format is Arabic numerals, as Target number.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
S1 obtains the instruction participle that part of speech label is designated as number, as target instruction target word point from instruction participle set Word, wherein part of speech label is designated as in the instruction participle of number comprising effective digital information.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
(1) inquiry instruction of voice input is obtained;Identify command information entrained in inquiry instruction;According to command information Generate instruction text;
(2) inquiry instruction inputted by input equipment is obtained;Inquiry instruction is parsed, instruction text is obtained.
Optionally, in the present embodiment, those of ordinary skill in the art will appreciate that in the various methods of above-described embodiment All or part of the steps be that the relevant hardware of terminal device can be instructed to complete by program, the program can store in In one computer readable storage medium, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), disk or CD etc..
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention State all or part of the steps of method.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (15)

1. a kind of digital extraction method characterized by comprising
Obtain the instruction text to match with the inquiry instruction of input;
Participle mark processing is executed to described instruction text, obtains instruction participle set, wherein in described instruction participle set Each instruction is each configured with part of speech label;
Determine that target instruction target word segments from described instruction participle set according to the part of speech label, wherein the target instruction target word It include effective digital information in participle;
Positional relationship between the target instruction target word participle according to included in described instruction participle set, from described instruction text The target number with the effective digital information matches is extracted in this, wherein the target number is to allow machine recognition Number.
2. the method according to claim 1, wherein the mesh according to included in instruction participle set Positional relationship between mark instruction participle, extracts the number of targets with the effective digital information matches from described instruction text Word includes:
Obtain the digital lattice of number entrained in whole effective digital information included in described instruction participle set Formula;
In the case where the number format includes Chinese-character digital, according to the target included in described instruction participle set Positional relationship between instruction participle, determines the extraction mode of the Chinese-character digital;
Go out the target number according to the extraction schema extraction.
3. according to the method described in claim 2, it is characterized in that,
Positional relationship between the target instruction target word participle according to included in described instruction participle set, determine described in The extraction mode of Chinese-character digital includes: the position in described instruction participle set where at least two target instruction target word participles For continuous position, and the data type of the effective digital information included in at least two target instruction target word participle In the case where being integer type, determine that the extraction mode of the described at least two target instruction target word participles is extracted for combination Mode;
It is described that go out the target number according to the extraction schema extraction include: according to the combination extraction mode, described in combination At least two target instruction target word participles, obtain combined command field;Extraction matches described with the combined command field Target number.
4. according to the method described in claim 2, it is characterized in that,
Positional relationship between the target instruction target word participle according to included in described instruction participle set, determine described in The extraction mode of Chinese-character digital includes: that the position where the target instruction target word described in described instruction participle set segments is discrete In the case where position, determine that the extraction mode is discrete extraction mode;
It is described that go out the target number according to the extraction schema extraction include: to extract respectively according to the discrete extraction mode Number entrained by the effective digital information that target instruction target word participle described in described instruction participle set includes, as described Target number.
5. according to the method described in claim 2, it is characterized in that, described according to included in described instruction participle set Positional relationship between target instruction target word participle, before the extraction mode for determining the Chinese-character digital, further includes:
Obtain the first crucial participle and the second crucial participle in described instruction participle set, wherein described first crucial point Word and the target instruction target word segment adjacent and are located at before target instruction target word participle, the described second crucial participle and the target Instruction participle is adjacent and is located at after target instruction target word participle;
Described first crucial participle, target instruction target word participle and the second crucial participle combination are obtained into candidate field;
Digital composite template is called to be compared with the candidate field;
In the case where the candidate field and the digital composite template matching, according to described in the digital composite template extraction Target number.
6. according to the method described in claim 2, it is characterized in that, it is described obtain it is entrained in the effective digital information After the number format of number, further includes:
In the case where the number format is Arabic numerals, number entrained by the effective digital information is extracted, is made For the target number.
7. method according to any one of claim 1 to 6, described segmented according to the part of speech label from described instruction collects Determine that target instruction target word participle includes: in conjunction
From described instruction participle set, the instruction participle that the part of speech label is designated as number is obtained, is referred to as the target Enable participle, wherein the part of speech label is designated as in the instruction participle of number comprising the effective digital information.
8. method according to any one of claim 1 to 6, which is characterized in that the inquiry instruction of the acquisition and input The instruction text to match includes at least one of:
Obtain the inquiry instruction of voice input;Identify command information entrained in the inquiry instruction;According to the finger Information is enabled to generate described instruction text;
Obtain the inquiry instruction inputted by input equipment;The inquiry instruction is parsed, described instruction text is obtained.
9. a kind of digital extraction device characterized by comprising
Acquiring unit, the instruction text that the inquiry instruction for obtaining and inputting matches;
Processing unit obtains instruction participle set, wherein the finger for executing participle mark processing to described instruction text The instruction of each of participle set is enabled to be each configured with part of speech label;
Determination unit, for determining that target instruction target word segments from described instruction participle set according to the part of speech label, wherein It include effective digital information in the target instruction target word participle;
Extraction unit is closed for the position between the target instruction target word participle according to included in described instruction participle set System, extracts the target number with the effective digital information matches from described instruction text, wherein the target number is Allow the number of machine recognition.
10. device according to claim 9, which is characterized in that the extraction unit includes:
First obtains module, is taken for obtaining in whole effective digital information included in described instruction participle set The number format of the number of band;
Determining module, for segmenting institute in set according to described instruction in the case where the number format includes Chinese-character digital The positional relationship between target instruction target word participle for including, determines the extraction mode of the Chinese-character digital;
First extraction module, for going out the target number according to the extraction schema extraction.
11. device according to claim 10, which is characterized in that
The determining module includes: the first determining submodule, at least two targets in described instruction participle set Position where instructing participle is continuous position, and described effective included in at least two target instruction target word participle In the case that the data type of digital information is integer type, the described of the described at least two target instruction target word participles is determined Extraction mode is combination extraction mode;
First extraction module includes: the first extracting sub-module, for according to the combination extraction mode, combination to be described at least Two target instruction target word participles, obtain combined command field;Extract the target to match with the combined command field Number.
12. device according to claim 10, which is characterized in that
The determining module includes: the second determining submodule, for the target instruction target word participle described in described instruction participle set In the case that the position at place is discrete location, determine that the extraction mode is discrete extraction mode;
First extraction module includes: the second extracting sub-module, described in being extracted respectively according to the discrete extraction mode Number entrained by the effective digital information that target instruction target word participle described in instruction participle set includes, as the target Number.
13. device according to claim 10, which is characterized in that the extraction unit further include:
Second obtains module, between the target instruction target word participle described according to included in described instruction participle set Positional relationship, before the extraction mode for determining the Chinese-character digital, obtain first crucial in described instruction participle set Participle and the second crucial participle, wherein the described first crucial participle is adjacent with target instruction target word participle and is located at the target Before instruction participle, second key, which segments, and target instruction target word participle is adjacent and is located at the target instruction target word segments it Afterwards;
Composite module, for combining the described first crucial participle, target instruction target word participle and the second crucial participle To candidate field;
Comparison module, for calling digital composite template to be compared with the candidate field;
Second extraction module is used in the case where the candidate field and the digital composite template matching, according to described multiple It closes digital template and extracts the target number.
14. a kind of storage medium, the storage medium includes the program of storage, wherein described program executes above-mentioned power when running Benefit requires method described in 1 to 8 any one.
15. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to execute side described in any one of claim 1 to 8 by the computer program Method.
CN201810961840.2A 2018-08-22 2018-08-22 Digital extraction method and apparatus, storage medium, and electronic apparatus Active CN109299439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810961840.2A CN109299439B (en) 2018-08-22 2018-08-22 Digital extraction method and apparatus, storage medium, and electronic apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810961840.2A CN109299439B (en) 2018-08-22 2018-08-22 Digital extraction method and apparatus, storage medium, and electronic apparatus

Publications (2)

Publication Number Publication Date
CN109299439A true CN109299439A (en) 2019-02-01
CN109299439B CN109299439B (en) 2021-05-11

Family

ID=65165415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810961840.2A Active CN109299439B (en) 2018-08-22 2018-08-22 Digital extraction method and apparatus, storage medium, and electronic apparatus

Country Status (1)

Country Link
CN (1) CN109299439B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330243A (en) * 2021-12-31 2022-04-12 北京执象科技发展有限公司 Method and device for identifying oral calculation result, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196881A (en) * 2006-12-08 2008-06-11 富士通株式会社 Words symbolization processing method and system for number and special symbol string in text
US7836061B1 (en) * 2007-12-29 2010-11-16 Kaspersky Lab, Zao Method and system for classifying electronic text messages and spam messages
CN102184167A (en) * 2011-05-25 2011-09-14 安徽科大讯飞信息科技股份有限公司 Method and device for processing text data
CN102915313A (en) * 2011-08-05 2013-02-06 腾讯科技(深圳)有限公司 Error correction relation generation method and system in web search
CN107368466A (en) * 2017-06-27 2017-11-21 成都准星云学科技有限公司 A kind of name recognition methods and its system towards elementary mathematics field

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196881A (en) * 2006-12-08 2008-06-11 富士通株式会社 Words symbolization processing method and system for number and special symbol string in text
US7836061B1 (en) * 2007-12-29 2010-11-16 Kaspersky Lab, Zao Method and system for classifying electronic text messages and spam messages
CN102184167A (en) * 2011-05-25 2011-09-14 安徽科大讯飞信息科技股份有限公司 Method and device for processing text data
CN102915313A (en) * 2011-08-05 2013-02-06 腾讯科技(深圳)有限公司 Error correction relation generation method and system in web search
CN107368466A (en) * 2017-06-27 2017-11-21 成都准星云学科技有限公司 A kind of name recognition methods and its system towards elementary mathematics field

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李烯: "基于关键词共现的教育信息化工程发展初探", 《电化教育研究》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330243A (en) * 2021-12-31 2022-04-12 北京执象科技发展有限公司 Method and device for identifying oral calculation result, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109299439B (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN111190939B (en) User portrait construction method and device
CN107204184B (en) Audio recognition method and system
CN108447471A (en) Audio recognition method and speech recognition equipment
CN109766013A (en) Poetry sentence input recommendation method and device and electronic equipment
CN105095415B (en) The determination method and apparatus of network mood
CN106874253A (en) Recognize the method and device of sensitive information
CN103019407B (en) Input method application method, automatic question answering processing method, electronic equipment and server
CN109783624A (en) Answer generation method, device and the intelligent conversational system in knowledge based library
CN108305050A (en) Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium
CN104951807B (en) The determination method and apparatus of stock market's mood
CN111292752A (en) User intention identification method and device, electronic equipment and storage medium
CN109033075A (en) It is intended to matched method, apparatus, storage medium and terminal device
CN107590291A (en) A kind of searching method of picture, terminal device and storage medium
CN107741972A (en) A kind of searching method of picture, terminal device and storage medium
CN109190119B (en) Time extraction method and device, storage medium and electronic device
CN113889074A (en) Voice generation method, device, equipment and medium
CN107330009A (en) Descriptor disaggregated model creation method, creating device and storage medium
CN109597987A (en) A kind of text restoring method, device and electronic equipment
CN111179904A (en) Mixed text-to-speech conversion method and device, terminal and computer readable storage medium
CN113220854B (en) Intelligent dialogue method and device for machine reading and understanding
CN110222103A (en) Extract method and device, the computer equipment, storage medium of excel data
CN110246494A (en) Service request method, device and computer equipment based on speech recognition
CN109299439A (en) Digital extraction method and apparatus, storage medium and electronic device
CN110895555B (en) Data retrieval method and device, storage medium and electronic device
CN116543798A (en) Emotion recognition method and device based on multiple classifiers, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant