CN109614538A - A kind of extracting method, device and the equipment of agricultural product price data - Google Patents
A kind of extracting method, device and the equipment of agricultural product price data Download PDFInfo
- Publication number
- CN109614538A CN109614538A CN201811543073.XA CN201811543073A CN109614538A CN 109614538 A CN109614538 A CN 109614538A CN 201811543073 A CN201811543073 A CN 201811543073A CN 109614538 A CN109614538 A CN 109614538A
- Authority
- CN
- China
- Prior art keywords
- agricultural product
- target
- product information
- text
- product price
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention discloses extracting method, device, equipment and the computer readable storage mediums of a kind of agricultural product price data, this method comprises: being crawled to obtain corresponding text by target webpage, it determines that content topic be the text of agriculture theme is target text, corresponding target phrase is obtained to target text word segmentation processing;Extracting the agricultural product information that target phrase includes is target agricultural product information, judge whether target agricultural product information includes agricultural product price, if it is, it then determines that data are extracted to complete, target agricultural product information is stored in database, if not, then by searching the agricultural product information that there is relevance with target agricultural product information in database, determine the corresponding agricultural product price of target agricultural product information based on the agricultural product information found and the relevance of target agricultural product information, will the agricultural product price that determined target agricultural product information is added after save to database.To efficiently solve the problems, such as that the agricultural product price data that the prior art is extracted are imperfect, lack part is more.
Description
Technical field
The present invention relates to data analysis technique fields, more specifically to a kind of extraction side of agricultural product price data
Method, device, equipment and computer readable storage medium.
Background technique
With the development of information technology, the arrival of big data era, data volume be increased dramatically, and data have penetrated into each row
Each industry, and how to obtain useful information in bulk information is the problem of current people compare concern, obtaining data is primarily to solve
Certainly problem, since many data all cannot be directly acquired directly on the net, so need to use web crawlers, it is required to crawl
Data.
China is a large agricultural country, and market for farm products is an extremely important composition in China market economy system
Part, and agricultural product price is then the core of market for farm products.Agricultural product price is unstable, can to market for farm products practitioner with
Additional risk is carried out, huge harm is caused to the stable development of entire market for farm products.Therefore it usually requires to extract history
The price data of upper agricultural products is to realize corresponding analysis, still, the agricultural product historical price number obtained at present in each website
According to imperfect, lack part is more, is unfavorable for extracting that useful information is further to be studied in the data.
In conclusion the prior art has that the agricultural product price data extracted are imperfect, lack part is more.
Summary of the invention
The object of the present invention is to provide a kind of extracting method of agricultural product price data, device, equipment and computer-readable
Storage medium, the problem that the agricultural product price data for being able to solve extraction of the existing technology are imperfect, lack part is more.
To achieve the goals above, the invention provides the following technical scheme:
A kind of extracting method of agricultural product price data, comprising:
By crawling to obtain corresponding text in target webpage, determine that its content topic be the text of agriculture theme is target text
This, and word segmentation processing is carried out to the target text and obtains corresponding target phrase;
Extracting the agricultural product information for including in the target phrase is target agricultural product information, judges the target agricultural product
Whether include agricultural product price in information, is completed if it is, determining that data are extracted, and the target agricultural product information is stored in
Into database, if it is not, then by searching the agricultural product that there is relevance with the target agricultural product information in the database
Information determines the target agricultural product information based on the relevance of the agricultural product information and the target agricultural product information that find
Corresponding agricultural product price, and save after the target agricultural product information is added in the agricultural product price determined to the data
Library.
Preferably, the corresponding agricultural product price of the target agricultural product information is determined, comprising:
The agriculture that there is ratio relation by searching agricultural product price corresponding with the target agricultural product information in the database
The affiliated agricultural product information of product price, and closed using the agricultural product price and corresponding ratio for including in the agricultural product information found
System determines the corresponding agricultural product price of the target agricultural product information.
Preferably, word segmentation processing is carried out to the target text and obtains corresponding target phrase, comprising:
The target text is input in the participle model being pre-created, and determines the result of the participle model output
For target phrase;Wherein, the participle model is to advance with multiple samples of text and the corresponding phrase pair of each samples of text
BiLSTM-CRF model is trained.
Preferably, it is target text that content topic, which is the text of agriculture theme, in the determining text crawled, comprising:
The text crawled is separately input into the subject classification model being pre-created, determines the subject classification mould
The content topic of maximum probability is the content topic of corresponding text in the result of type output, and determines that content topic is agriculture theme
Text be target text;Wherein, the subject classification model is to advance with multiple samples of text and each samples of text
Content topic is trained LDA model.
Preferably, target webpage is determined, comprising:
Based on needing the agricultural product title for the agricultural product information for carrying out data extraction to scan for, and determine obtained webpage
In preceding preset quantity webpage be target webpage.
Preferably, extracting the agricultural product information for including in the target phrase is target agricultural product information, comprising:
Remove the stop-word and punctuation mark in the target phrase, obtains target agricultural product information.
A kind of extraction element of agricultural product price data, comprising:
Module is crawled, is used for: obtaining corresponding text by crawling in target webpage, determines its content topic for agriculture theme
Text be target text, and to the target text carry out word segmentation processing obtain corresponding target phrase;
Extraction module is used for: extracting the agricultural product information for including in the target phrase is target agricultural product information, judgement
Whether include agricultural product price in the target agricultural product information, is completed if it is, determining that data are extracted, and by the target
Agricultural product information is stored in into database, if it is not, then being had by searching in the database with the target agricultural product information
The agricultural product information of relevance, and institute is determined based on the relevance of the agricultural product information and the target agricultural product information that find
The corresponding agricultural product price of target agricultural product information is stated, and the target agricultural product information is added in the agricultural product price determined
After save to the database.
Preferably, the extraction module includes:
Determination unit is used for: being had by searching agricultural product price corresponding with the target agricultural product information in the database
There is the affiliated agricultural product information of the agricultural product price of ratio relation, and utilizes the agricultural product valence for including in the agricultural product information found
Lattice and corresponding ratio relation determine the corresponding agricultural product price of the target agricultural product information.
A kind of extract equipment of agricultural product price data, comprising:
Memory, for storing computer program;
Processor realizes a kind of any one of as above agricultural product price data when for executing the computer program
The step of extracting method.
A kind of computer readable storage medium is stored with computer program on the computer readable storage medium, described
The step of a kind of any one of the as above extracting method of the agricultural product price data is realized when computer program is executed by processor.
Technical solution disclosed in the present application is crawled to obtain corresponding text first by target webpage, determines agriculture theme
Text obtains corresponding target phrase to carry out word segmentation processing to target text after target text, and then by extracting in target phrase
Agricultural product information is target agricultural product information, directly by target agriculture if target agricultural product information includes corresponding agricultural product price
Product information is saved to database, if lacking corresponding agricultural product price in target agricultural product information, is based in database
What is be saved with target agricultural product information there are other agricultural product informations of relevance to determine that target agricultural product information is corresponding
Agricultural product price, then the agricultural product price determined and target agricultural product information are stored to database, thus by this
Mode effectively realizes the filling of the agricultural product price of missing, obtains more complete agricultural product information, efficiently solves existing skill
The problem that the existing agricultural product price data extracted of art are imperfect, lack part is more.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of the extracting method of agricultural product price data provided in an embodiment of the present invention;
Fig. 2 is BiLSTM-CRF model in a kind of extracting method of agricultural product price data provided in an embodiment of the present invention
Structural schematic diagram;
Fig. 3 is a kind of structural schematic diagram of the extraction element of agricultural product price data provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, it illustrates a kind of streams of the extracting method of agricultural product price data provided in an embodiment of the present invention
Cheng Tu may include:
S11: obtaining corresponding text by crawling in target webpage, determines that its content topic be the text of agriculture theme is mesh
Text is marked, and word segmentation processing is carried out to target text and obtains corresponding target phrase.
A kind of execution subject of the extracting method of agricultural product price data provided in an embodiment of the present invention can be corresponding
Extraction element.It should be noted that target webpage is to need to realize information crawler with the agriculture as needed for the contents extraction crawled
The webpage of product price data arbitrarily needs the webpage for carrying out information crawler to can be used as target webpage;By in target webpage
After crawling to obtain corresponding multiple texts, it is thus necessary to determine that the content topic of these texts is that (content topic can be with for which kind of theme
Including agriculture theme, economical topics, service trade etc.), content topic is generally to have agricultural product letter in the text of agriculture theme
(agricultural product information may include agricultural product title to breath, agricultural product price, agricultural product price corresponds to place, agricultural product price corresponds to
Time etc.), therefore the present embodiment determines that content topic is that the text of agriculture theme is target text and obtains institute by file destination
Need data.In addition, the smooth extraction in order to realize required data, the present embodiment determines that its content topic is the text of agriculture theme
It originally is that can carry out word segmentation processing after target text to target text and obtain corresponding target phrase, by being obtained in target phrase
The word of agricultural product information.
S12: extracting the agricultural product information for including in target phrase is target agricultural product information, judges target agricultural product information
In whether include agricultural product price, complete, and target agricultural product information be stored in database if it is, determining that data are extracted
In, if it is not, then by searching the agricultural product information that there is relevance with target agricultural product information in database, based on what is found
Agricultural product information and the relevance of target agricultural product information determine the corresponding agricultural product price of target agricultural product information, and will determine
Agricultural product price out saves after target agricultural product information is added to database.
It can be target agricultural product information by the word for obtaining agricultural product information in target phrase after obtaining target phrase,
It can directly judge in target agricultural product information whether to include agricultural product price at this time, if it is, explanation has been extracted
Required agricultural product price data, namely successfully realize the extraction of agricultural product price data;Otherwise then explanation is extracted not successfully
Required agricultural product price data then need to determine at this time the agricultural product price not extracted by certain technology, and will
The agricultural product price saves after being filled into corresponding agricultural product information, for subsequent analysis.Wherein it is determined that the agricultural production not extracted
Product price, can be has other certain associated being saved with the agricultural product price not extracted by searching in database
Agricultural product price, and this relevance can be on place with relevance, on the time to have in relevance or type and have
Relevant property etc., and then can be true based on the agricultural product price being saved and its with the relevance of undrawn agricultural product price
The value for making undrawn agricultural product price realizes the filling of agricultural product price.
Technical solution disclosed in the present application is crawled to obtain corresponding text first by target webpage, determines agriculture theme
Text obtains corresponding target phrase to carry out word segmentation processing to target text after target text, and then by extracting in target phrase
Agricultural product information is target agricultural product information, directly by target agriculture if target agricultural product information includes corresponding agricultural product price
Product information is saved to database, if lacking corresponding agricultural product price in target agricultural product information, is based in database
What is be saved with target agricultural product information there are other agricultural product informations of relevance to determine that target agricultural product information is corresponding
Agricultural product price, then the agricultural product price determined and target agricultural product information are stored to database, thus by this
Mode effectively realizes the filling of the agricultural product price of missing, obtains more complete agricultural product information, efficiently solves existing skill
The problem that the existing agricultural product price data extracted of art are imperfect, lack part is more.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention determines that target agricultural product information is corresponding
Agricultural product price, may include:
The agricultural product price that there is ratio relation by searching agricultural product price corresponding with target agricultural product information in database
Affiliated agricultural product information, and mesh is determined using the agricultural product price and corresponding ratio relation for including in the agricultural product information found
Mark the corresponding agricultural product price of agricultural product information.
It should be noted that the ratio relation having between different agricultural product prices can be based on place, time or kind
Class etc. determines.Specifically, when the ratio relation having between different agricultural product prices is determined based on place, same agricultural product exist
There is ratio relation between the agricultural product price of different location, then can use the correspondence agricultural product price saved in database and multiply
Target agricultural product information, which is obtained, with above-mentioned ratio relation corresponds to agricultural product price;The ratio having between different agricultural product prices closes
When system is determined based on the time, same agricultural product have ratio relation between the agricultural product price of different time, then can use
The correspondence agricultural product price saved in database obtains target agricultural product information multiplied by above-mentioned ratio relation and corresponds to agricultural product price;
When the ratio relation having between different agricultural product prices is determined based on type, the agricultural product prices of different types of agricultural product it
Between have ratio relation, then can use the correspondence agricultural product price saved in database and obtain target multiplied by above-mentioned ratio relation
Agricultural product information corresponds to agricultural product price;Wherein, above-mentioned ratio relation can be staff and be based on being also possible to obtained by experience
In advance to obtained have These characteristics (i.e. different types of agricultural product or same agricultural product correspond to different location, when
Between) agricultural product price carry out data analysis gained, can be indicated with regular expression.It in this way can be compared with subject to
The true value determining the data of missing and should having, realizes effective filling of data.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention carries out word segmentation processing to target text
Corresponding target phrase is obtained, may include:
Target text is input in the participle model being pre-created, and determines that the result of participle model output is target word
Group;Wherein, participle model is to advance with multiple samples of text and the corresponding phrase of each samples of text to BiLSTM-CRF mould
What type was trained.
It should be noted that multiple samples of text and the corresponding phrase of each samples of text can be obtained in advance, thus sharp
Obtain can be realized the participle model of word segmentation processing with this multiple samples of text and corresponding phrase training BiLSTM-CRF model.Its
In, the text of samples of text is that be by its content topic crawled in website be agriculture theme, and each samples of text is corresponding
Phrase is the phrase for obtain after word segmentation processing to samples of text;And BiLSTM-CRF model and counterparty in the prior art
Case principle is identical, and structural schematic diagram can be as shown in Fig. 2, specifically the BiLSTM-CRF model be divided into three layers: (1) first
Layer is look-up layers, is first word vector using one-hot DUAL PROBLEMS OF VECTOR MAPPING by each word;It is inputted before next layer again, it is settable
Dropout layers prevent over-fitting;(2) second layer is BiLSTM- layers, using first layer sequence obtained as BiLSTM-
The input of each time step, the layer include forward direction LSTM and LSTM layers reversed, by the output opsition dependent splicing of two layers, and then are obtained
Obtain completely hidden status switch;It is then inputted into the dropout layer set, wherein including a linear layer, to obtain certainly
The dynamic sentence characteristics extracted;(3) third layer is CRF layers, this layer is for sequence labelling, to achieve the effect that participle;CRF
The parameter of layer is the matrix A of one (k+2) × (k+2), AijWhat is indicated is the transfer score from i-th of label to j-th of label,
And then it can use the label marked before this when being labeled for a position.If one length of note is equal to
Sequence label y=(the y of sentence length1, y2..., yn), then label (word segmentation result) beating equal to y of the model for sentence x
It is divided into:
And then normalized probability is obtained using softmax,Entire model training mistake
Journey, as long as maximizing its log-likelihood function.Whole process is exactly to pass through classification, to realize participle.As a result, in training
After obtaining participle model, it is only necessary to which target text, which is input to participle model, can be obtained corresponding phrase, participle process operation
Simply, quickly, and experiments have shown that the word segmentation processing accuracy realized of this mode is higher.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention determines in the text crawled
Holding the text that theme is agriculture theme is target text, may include:
The text crawled is separately input into the subject classification model being pre-created, determines the subject classification mould
The content topic of maximum probability is the content topic of corresponding text in the result of type output, and determines that content topic is agriculture theme
Text be target text;Wherein, the subject classification model is to advance with multiple samples of text and each samples of text
Content topic is trained LDA model.
It should be noted that multiple samples of text and the corresponding content topic of each samples of text can be obtained in advance, from
And it obtains can be realized the determining subject classification mould of content topic using this multiple samples of text and content topic training LDA model
Type.Wherein, the text of the samples of text agriculture theme that is that be by its content topic crawled in website be, target text is input to
After subject classification model, the result of subject classification model output includes that target text may corresponding each content topic and correspondence
The percentage of each content topic, and then determine that the highest content topic of percentage is the content topic of target text.It is logical
This mode is crossed after training obtains subject classification model, it is only necessary to which target text, which is input to subject classification model i.e., can determine that
The content topic of target text, it is easy to operate, quick, and it is experimentally confirmed the efficiency of this mode elder generation subject classification and accurate
Property is higher.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention, determines target webpage, may include:
Based on needing the agricultural product title for the agricultural product information for carrying out data extraction to scan for, and determine obtained webpage
In preceding preset quantity webpage be target webpage.
Wherein, preset quantity can be determined according to actual needs;Determine that the agricultural product information for needing to obtain is corresponding
The agricultural product title of agricultural product, agricultural product title is input in search box, and multiple nets can be obtained in automatically clicking search button
Page, and only using preceding preset quantity webpage as target webpage be more be possible to because of the forward webpage that searches out include
The information for needing to obtain;In addition, the application is crawled by the data of specific website, it ensure that the data of acquisition are required data
Possibility.
Furthermore it is also possible to be search is realized into as the keyword that scans in agricultural product title, time and place, and
Search, which is obtained preceding preset quantity webpage in webpage, to be needed as target webpage so that searching for obtained result and meeting
A possibility that asking is bigger.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention, extracts in the target phrase and includes
Agricultural product information be target agricultural product information, may include:
Remove the stop-word and punctuation mark in the target phrase, obtains target agricultural product information.
When the agricultural product information for including in determining target phrase in the application, it is only necessary to by the punctuation mark in target phrase
And stop-word removes, remaining in target phrase is the trunk portion of target text, such as time, place, main body, thus
Can quick obtaining to target agricultural product information.
It, can be with it is further to note that in the application when getting the agricultural product price in target agricultural product price
Judge whether agricultural product price is reasonable by artificial or other modes, if it is, determining mesh according to mode disclosed in the present application
It marks corresponding agricultural product price in agricultural product information (above-described embodiment is disclosed to be determined according to ratio relation), and to determine
Agricultural product price replaces the agricultural product price extracted, otherwise then without determining agricultural product price again, to pass through this side
Formula realizes Information revision, ensure that the validity of agricultural product information.
The embodiment of the invention also provides a kind of extraction elements of agricultural product price data, as shown in figure 3, may include:
Module 11 is crawled, is used for: obtaining corresponding text by crawling in target webpage, determines that its content topic is main for agricultural
The text of topic is target text, and carries out word segmentation processing to the target text and obtain corresponding target phrase;
Extraction module 12, is used for: extracting the agricultural product information for including in the target phrase is target agricultural product information, is sentenced
Break whether comprising agricultural product price in the target agricultural product information, completed if it is, determining that data are extracted, and by the mesh
Mark agricultural product information is stored in into database, if it is not, then being had by searching in the database with the target agricultural product information
The agricultural product information of relevant property, and determined based on the relevance of the agricultural product information and the target agricultural product information that find
The corresponding agricultural product price of the target agricultural product information, and the target agricultural product are added in the agricultural product price determined and are believed
It saves after breath to the database.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, the extraction module may include:
Determination unit is used for: being had by searching agricultural product price corresponding with the target agricultural product information in the database
There is the affiliated agricultural product information of the agricultural product price of ratio relation, and utilizes the agricultural product valence for including in the agricultural product information found
Lattice and corresponding ratio relation determine the corresponding agricultural product price of the target agricultural product information.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, crawling module may include:
Participle unit is used for: the target text being input in the participle model being pre-created, and determines the participle
The result of model output is target phrase;Wherein, the participle model is to advance with multiple samples of text and each text sample
What this corresponding phrase was trained BiLSTM-CRF model.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, crawling module may include:
Theme determination unit, is used for: the text crawled is separately input into the subject classification model being pre-created,
It determines that the content topic of maximum probability in the result of the subject classification model output is the content topic of corresponding text, and determines
Content topic is that the text of agriculture theme is target text;Wherein, the subject classification model is to advance with multiple text samples
What the content topic of this and each samples of text was trained LDA model.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, crawling module may include:
Webpage determination unit, is used for: based on needing the agricultural product title for the agricultural product information for carrying out data extraction to be searched
Rope, and determine that preceding preset quantity webpage is target webpage in obtained webpage.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, extraction module may include:
Extraction unit is used for: being removed the stop-word and punctuation mark in the target phrase, is obtained target agricultural product letter
Breath.
The embodiment of the invention also provides a kind of extract equipments of agricultural product price data, may include:
Memory, for storing computer program;
Processor realizes a kind of agricultural product price data as described in weighing upper any one when for executing the computer program
Extracting method the step of.
The embodiment of the invention also provides a kind of computer readable storage medium, deposited on the computer readable storage medium
Computer program is contained, a kind of as above any one agricultural product valence may be implemented in the computer program when being executed by processor
The step of extracting method of lattice data.
It should be noted that extraction element, equipment and the meter of a kind of agricultural product price data provided in an embodiment of the present invention
The explanation of relevant portion refers to a kind of agricultural product price data provided in an embodiment of the present invention in calculation machine readable storage medium storing program for executing
The detailed description of corresponding part in extracting method, details are not described herein.In addition above-mentioned technical proposal provided in an embodiment of the present invention
In with correspond to the consistent part of technical solution realization principle and unspecified in the prior art, in order to avoid excessively repeat.
The foregoing description of the disclosed embodiments can be realized those skilled in the art or using the present invention.To this
A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein can
Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited
It is formed on the embodiments shown herein, and is to fit to consistent with the principles and novel features disclosed in this article widest
Range.
Claims (10)
1. a kind of extracting method of agricultural product price data characterized by comprising
By crawling to obtain corresponding text in target webpage, determine that its content topic be the text of agriculture theme is target text,
And word segmentation processing is carried out to the target text and obtains corresponding target phrase;
Extracting the agricultural product information for including in the target phrase is target agricultural product information, judges the target agricultural product information
In whether include agricultural product price, complete, and the target agricultural product information be stored in number if it is, determining that data are extracted
According in library, if it is not, then by searching the agricultural product information that there is relevance with the target agricultural product information in the database,
Determine that the target agricultural product information is corresponding with the relevance of the target agricultural product information based on the agricultural product information found
Agricultural product price, and will the agricultural product price that determined the target agricultural product information is added after save to the database.
2. the method according to claim 1, wherein determining the corresponding agricultural product valence of the target agricultural product information
Lattice, comprising:
The agricultural product that there is ratio relation by searching agricultural product price corresponding with the target agricultural product information in the database
The affiliated agricultural product information of price, and it is true using the agricultural product price and corresponding ratio relation for including in the agricultural product information found
Determine the corresponding agricultural product price of the target agricultural product information.
3. according to the method described in claim 2, it is characterized in that, being obtained to target text progress word segmentation processing corresponding
Target phrase, comprising:
The target text is input in the participle model being pre-created, and determines that the result of the participle model output is mesh
Mark phrase;Wherein, the participle model is to advance with multiple samples of text and the corresponding phrase pair of each samples of text
BiLSTM-CRF model is trained.
4. according to the method described in claim 3, it is characterized in that, determining that content topic is main for agricultural in the text crawled
The text of topic is target text, comprising:
The text crawled is separately input into the subject classification model being pre-created, determines that the subject classification model is defeated
The content topic of maximum probability is the content topic of corresponding text in result out, and determines that content topic is the text of agriculture theme
This is target text;Wherein, the subject classification model is the content for advancing with multiple samples of text and each samples of text
Theme is trained LDA model.
5. according to the method described in claim 4, it is characterized in that, determining target webpage, comprising:
Based on needing the agricultural product title for the agricultural product information for carrying out data extraction to scan for, and before determining in obtained webpage
Preset quantity webpage is target webpage.
6. according to the method described in claim 5, it is characterized in that, extracting the agricultural product information for including in the target phrase and being
Target agricultural product information, comprising:
Remove the stop-word and punctuation mark in the target phrase, obtains target agricultural product information.
7. a kind of extraction element of agricultural product price data characterized by comprising
Module is crawled, is used for: obtaining corresponding text by crawling in target webpage, determines that its content topic is the text of agriculture theme
This is target text, and carries out word segmentation processing to the target text and obtain corresponding target phrase;
Extraction module is used for: extracting the agricultural product information for including in the target phrase is target agricultural product information, described in judgement
Whether include agricultural product price in target agricultural product information, is completed if it is, determining that data are extracted, and by the target agricultural production
Product information is stored in into database, if it is not, then being associated with by searching to have with the target agricultural product information in the database
Property agricultural product information, and the mesh is determined based on the relevance of the agricultural product information that finds and the target agricultural product information
The corresponding agricultural product price of agricultural product information is marked, and is protected after the target agricultural product information is added in the agricultural product price determined
It deposits to the database.
8. device according to claim 7, which is characterized in that the extraction module includes:
Determination unit is used for: by searched in the database corresponding with target agricultural product information agricultural product price with than
The affiliated agricultural product information of the agricultural product price of value relationship, and using the agricultural product price for including in the agricultural product information that finds and
Corresponding ratio relation determines the corresponding agricultural product price of the target agricultural product information.
9. a kind of extract equipment of agricultural product price data characterized by comprising
Memory, for storing computer program;
Processor realizes a kind of agricultural product price as described in any one of claim 1 to 6 when for executing the computer program
The step of extracting method of data.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program realizes a kind of agricultural product price number as described in any one of claim 1 to 6 when the computer program is executed by processor
According to extracting method the step of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811543073.XA CN109614538A (en) | 2018-12-17 | 2018-12-17 | A kind of extracting method, device and the equipment of agricultural product price data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811543073.XA CN109614538A (en) | 2018-12-17 | 2018-12-17 | A kind of extracting method, device and the equipment of agricultural product price data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109614538A true CN109614538A (en) | 2019-04-12 |
Family
ID=66009539
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811543073.XA Pending CN109614538A (en) | 2018-12-17 | 2018-12-17 | A kind of extracting method, device and the equipment of agricultural product price data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109614538A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110099133A1 (en) * | 2009-10-28 | 2011-04-28 | Industrial Technology Research Institute | Systems and methods for capturing and managing collective social intelligence information |
CN104391963A (en) * | 2014-12-01 | 2015-03-04 | 北京中科创益科技有限公司 | Method for constructing correlation networks of keywords of natural language texts |
CN104516879A (en) * | 2013-09-26 | 2015-04-15 | Sap欧洲公司 | Method and system for managing database containing record with missing value |
CN105205099A (en) * | 2015-08-20 | 2015-12-30 | 中国农业大学 | Agricultural product price analysis method |
CN106407464A (en) * | 2016-10-12 | 2017-02-15 | 南京航空航天大学 | KNN-based improved missing data filling algorithm |
CN108664589A (en) * | 2018-05-08 | 2018-10-16 | 苏州大学 | Text message extracting method, device, system and medium based on domain-adaptive |
CN108920461A (en) * | 2018-06-26 | 2018-11-30 | 武大吉奥信息技术有限公司 | A kind of polymorphic type and entity abstracting method and device containing complex relationship |
-
2018
- 2018-12-17 CN CN201811543073.XA patent/CN109614538A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110099133A1 (en) * | 2009-10-28 | 2011-04-28 | Industrial Technology Research Institute | Systems and methods for capturing and managing collective social intelligence information |
CN104516879A (en) * | 2013-09-26 | 2015-04-15 | Sap欧洲公司 | Method and system for managing database containing record with missing value |
CN104391963A (en) * | 2014-12-01 | 2015-03-04 | 北京中科创益科技有限公司 | Method for constructing correlation networks of keywords of natural language texts |
CN105205099A (en) * | 2015-08-20 | 2015-12-30 | 中国农业大学 | Agricultural product price analysis method |
CN106407464A (en) * | 2016-10-12 | 2017-02-15 | 南京航空航天大学 | KNN-based improved missing data filling algorithm |
CN108664589A (en) * | 2018-05-08 | 2018-10-16 | 苏州大学 | Text message extracting method, device, system and medium based on domain-adaptive |
CN108920461A (en) * | 2018-06-26 | 2018-11-30 | 武大吉奥信息技术有限公司 | A kind of polymorphic type and entity abstracting method and device containing complex relationship |
Non-Patent Citations (12)
Title |
---|
刘晓刚: "农产品大数据的抓取和分析方法探索", 《农村经济与科技》 * |
刘晓刚: "农产品大数据的抓取和分析方法探索", 《农村经济与科技》, vol. 29, no. 19, 20 October 2018 (2018-10-20), pages 304 - 305 * |
孟繁疆等: "农产品价格主题搜索引擎的研究与实现", 《东北农业大学学报》 * |
孟繁疆等: "农产品价格主题搜索引擎的研究与实现", 《东北农业大学学报》, vol. 47, no. 09, 30 September 2016 (2016-09-30), pages 64 - 71 * |
张伟等: "利用数据挖掘技术建设农业智能综合信息服务平台", 《农业网络信息》 * |
张伟等: "利用数据挖掘技术建设农业智能综合信息服务平台", 《农业网络信息》, no. 08, 26 August 2011 (2011-08-26), pages 36 - 38 * |
杨晓东等: "基于Hadoop平台的农产品价格数据爬取和存储系统的研究", 《计算机应用与软件》 * |
杨晓东等: "基于Hadoop平台的农产品价格数据爬取和存储系统的研究", 《计算机应用与软件》, no. 03, 15 March 2017 (2017-03-15), pages 82 - 86 * |
杨雄钢: "基于web的农产品市场价格分析与预测信息系统设计与实现", 《农家参谋》 * |
杨雄钢: "基于web的农产品市场价格分析与预测信息系统设计与实现", 《农家参谋》, no. 17, 24 August 2018 (2018-08-24), pages 48 - 49 * |
王文生等: "农业大数据及其应用展望", 《江苏农业科学》 * |
王文生等: "农业大数据及其应用展望", 《江苏农业科学》, no. 09, 25 September 2015 (2015-09-25), pages 15 - 19 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110502621A (en) | Answering method, question and answer system, computer equipment and storage medium | |
CN103514299B (en) | Information search method and device | |
CN104199833B (en) | The clustering method and clustering apparatus of a kind of network search words | |
CN103544176B (en) | Method and apparatus for generating the page structure template corresponding to multiple pages | |
CN104331449B (en) | Query statement and determination method, device, terminal and the server of webpage similarity | |
CN110909164A (en) | Text enhancement semantic classification method and system based on convolutional neural network | |
JP5421737B2 (en) | Computer implementation method | |
CN109493265A (en) | A kind of Policy Interpretation method and Policy Interpretation system based on deep learning | |
CN110347894A (en) | Knowledge mapping processing method, device, computer equipment and storage medium based on crawler | |
CN104102721A (en) | Method and device for recommending information | |
CN110442841A (en) | Identify method and device, the computer equipment, storage medium of resume | |
CN109918560A (en) | A kind of answering method and device based on search engine | |
CN103886020B (en) | A kind of real estate information method for fast searching | |
CN111325018B (en) | Domain dictionary construction method based on web retrieval and new word discovery | |
CN107193892A (en) | A kind of document subject matter determines method and device | |
Scheirer et al. | The sense of a connection: Automatic tracing of intertextuality by meaning | |
Peters et al. | Tag gardening for folksonomy enrichment and maintenance | |
CN109522275B (en) | Label mining method based on user production content, electronic device and storage medium | |
CN109472022A (en) | New word identification method and terminal device based on machine learning | |
CA2793570A1 (en) | Systems and methods for research database management | |
CN106599305A (en) | Crowdsourcing-based heterogeneous media semantic meaning fusion method | |
CN109614538A (en) | A kind of extracting method, device and the equipment of agricultural product price data | |
CN110390037A (en) | Information classification approach, device, equipment and storage medium based on dom tree | |
CN112989811B (en) | History book reading auxiliary system based on BiLSTM-CRF and control method thereof | |
CN115640439A (en) | Method, system and storage medium for network public opinion monitoring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |