CN109614538A - A kind of extracting method, device and the equipment of agricultural product price data - Google Patents

A kind of extracting method, device and the equipment of agricultural product price data Download PDF

Info

Publication number
CN109614538A
CN109614538A CN201811543073.XA CN201811543073A CN109614538A CN 109614538 A CN109614538 A CN 109614538A CN 201811543073 A CN201811543073 A CN 201811543073A CN 109614538 A CN109614538 A CN 109614538A
Authority
CN
China
Prior art keywords
agricultural product
target
product information
text
product price
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811543073.XA
Other languages
Chinese (zh)
Inventor
王铭锋
左亚尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201811543073.XA priority Critical patent/CN109614538A/en
Publication of CN109614538A publication Critical patent/CN109614538A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses extracting method, device, equipment and the computer readable storage mediums of a kind of agricultural product price data, this method comprises: being crawled to obtain corresponding text by target webpage, it determines that content topic be the text of agriculture theme is target text, corresponding target phrase is obtained to target text word segmentation processing;Extracting the agricultural product information that target phrase includes is target agricultural product information, judge whether target agricultural product information includes agricultural product price, if it is, it then determines that data are extracted to complete, target agricultural product information is stored in database, if not, then by searching the agricultural product information that there is relevance with target agricultural product information in database, determine the corresponding agricultural product price of target agricultural product information based on the agricultural product information found and the relevance of target agricultural product information, will the agricultural product price that determined target agricultural product information is added after save to database.To efficiently solve the problems, such as that the agricultural product price data that the prior art is extracted are imperfect, lack part is more.

Description

A kind of extracting method, device and the equipment of agricultural product price data
Technical field
The present invention relates to data analysis technique fields, more specifically to a kind of extraction side of agricultural product price data Method, device, equipment and computer readable storage medium.
Background technique
With the development of information technology, the arrival of big data era, data volume be increased dramatically, and data have penetrated into each row Each industry, and how to obtain useful information in bulk information is the problem of current people compare concern, obtaining data is primarily to solve Certainly problem, since many data all cannot be directly acquired directly on the net, so need to use web crawlers, it is required to crawl Data.
China is a large agricultural country, and market for farm products is an extremely important composition in China market economy system Part, and agricultural product price is then the core of market for farm products.Agricultural product price is unstable, can to market for farm products practitioner with Additional risk is carried out, huge harm is caused to the stable development of entire market for farm products.Therefore it usually requires to extract history The price data of upper agricultural products is to realize corresponding analysis, still, the agricultural product historical price number obtained at present in each website According to imperfect, lack part is more, is unfavorable for extracting that useful information is further to be studied in the data.
In conclusion the prior art has that the agricultural product price data extracted are imperfect, lack part is more.
Summary of the invention
The object of the present invention is to provide a kind of extracting method of agricultural product price data, device, equipment and computer-readable Storage medium, the problem that the agricultural product price data for being able to solve extraction of the existing technology are imperfect, lack part is more.
To achieve the goals above, the invention provides the following technical scheme:
A kind of extracting method of agricultural product price data, comprising:
By crawling to obtain corresponding text in target webpage, determine that its content topic be the text of agriculture theme is target text This, and word segmentation processing is carried out to the target text and obtains corresponding target phrase;
Extracting the agricultural product information for including in the target phrase is target agricultural product information, judges the target agricultural product Whether include agricultural product price in information, is completed if it is, determining that data are extracted, and the target agricultural product information is stored in Into database, if it is not, then by searching the agricultural product that there is relevance with the target agricultural product information in the database Information determines the target agricultural product information based on the relevance of the agricultural product information and the target agricultural product information that find Corresponding agricultural product price, and save after the target agricultural product information is added in the agricultural product price determined to the data Library.
Preferably, the corresponding agricultural product price of the target agricultural product information is determined, comprising:
The agriculture that there is ratio relation by searching agricultural product price corresponding with the target agricultural product information in the database The affiliated agricultural product information of product price, and closed using the agricultural product price and corresponding ratio for including in the agricultural product information found System determines the corresponding agricultural product price of the target agricultural product information.
Preferably, word segmentation processing is carried out to the target text and obtains corresponding target phrase, comprising:
The target text is input in the participle model being pre-created, and determines the result of the participle model output For target phrase;Wherein, the participle model is to advance with multiple samples of text and the corresponding phrase pair of each samples of text BiLSTM-CRF model is trained.
Preferably, it is target text that content topic, which is the text of agriculture theme, in the determining text crawled, comprising:
The text crawled is separately input into the subject classification model being pre-created, determines the subject classification mould The content topic of maximum probability is the content topic of corresponding text in the result of type output, and determines that content topic is agriculture theme Text be target text;Wherein, the subject classification model is to advance with multiple samples of text and each samples of text Content topic is trained LDA model.
Preferably, target webpage is determined, comprising:
Based on needing the agricultural product title for the agricultural product information for carrying out data extraction to scan for, and determine obtained webpage In preceding preset quantity webpage be target webpage.
Preferably, extracting the agricultural product information for including in the target phrase is target agricultural product information, comprising:
Remove the stop-word and punctuation mark in the target phrase, obtains target agricultural product information.
A kind of extraction element of agricultural product price data, comprising:
Module is crawled, is used for: obtaining corresponding text by crawling in target webpage, determines its content topic for agriculture theme Text be target text, and to the target text carry out word segmentation processing obtain corresponding target phrase;
Extraction module is used for: extracting the agricultural product information for including in the target phrase is target agricultural product information, judgement Whether include agricultural product price in the target agricultural product information, is completed if it is, determining that data are extracted, and by the target Agricultural product information is stored in into database, if it is not, then being had by searching in the database with the target agricultural product information The agricultural product information of relevance, and institute is determined based on the relevance of the agricultural product information and the target agricultural product information that find The corresponding agricultural product price of target agricultural product information is stated, and the target agricultural product information is added in the agricultural product price determined After save to the database.
Preferably, the extraction module includes:
Determination unit is used for: being had by searching agricultural product price corresponding with the target agricultural product information in the database There is the affiliated agricultural product information of the agricultural product price of ratio relation, and utilizes the agricultural product valence for including in the agricultural product information found Lattice and corresponding ratio relation determine the corresponding agricultural product price of the target agricultural product information.
A kind of extract equipment of agricultural product price data, comprising:
Memory, for storing computer program;
Processor realizes a kind of any one of as above agricultural product price data when for executing the computer program The step of extracting method.
A kind of computer readable storage medium is stored with computer program on the computer readable storage medium, described The step of a kind of any one of the as above extracting method of the agricultural product price data is realized when computer program is executed by processor.
Technical solution disclosed in the present application is crawled to obtain corresponding text first by target webpage, determines agriculture theme Text obtains corresponding target phrase to carry out word segmentation processing to target text after target text, and then by extracting in target phrase Agricultural product information is target agricultural product information, directly by target agriculture if target agricultural product information includes corresponding agricultural product price Product information is saved to database, if lacking corresponding agricultural product price in target agricultural product information, is based in database What is be saved with target agricultural product information there are other agricultural product informations of relevance to determine that target agricultural product information is corresponding Agricultural product price, then the agricultural product price determined and target agricultural product information are stored to database, thus by this Mode effectively realizes the filling of the agricultural product price of missing, obtains more complete agricultural product information, efficiently solves existing skill The problem that the existing agricultural product price data extracted of art are imperfect, lack part is more.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of the extracting method of agricultural product price data provided in an embodiment of the present invention;
Fig. 2 is BiLSTM-CRF model in a kind of extracting method of agricultural product price data provided in an embodiment of the present invention Structural schematic diagram;
Fig. 3 is a kind of structural schematic diagram of the extraction element of agricultural product price data provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, it illustrates a kind of streams of the extracting method of agricultural product price data provided in an embodiment of the present invention Cheng Tu may include:
S11: obtaining corresponding text by crawling in target webpage, determines that its content topic be the text of agriculture theme is mesh Text is marked, and word segmentation processing is carried out to target text and obtains corresponding target phrase.
A kind of execution subject of the extracting method of agricultural product price data provided in an embodiment of the present invention can be corresponding Extraction element.It should be noted that target webpage is to need to realize information crawler with the agriculture as needed for the contents extraction crawled The webpage of product price data arbitrarily needs the webpage for carrying out information crawler to can be used as target webpage;By in target webpage After crawling to obtain corresponding multiple texts, it is thus necessary to determine that the content topic of these texts is that (content topic can be with for which kind of theme Including agriculture theme, economical topics, service trade etc.), content topic is generally to have agricultural product letter in the text of agriculture theme (agricultural product information may include agricultural product title to breath, agricultural product price, agricultural product price corresponds to place, agricultural product price corresponds to Time etc.), therefore the present embodiment determines that content topic is that the text of agriculture theme is target text and obtains institute by file destination Need data.In addition, the smooth extraction in order to realize required data, the present embodiment determines that its content topic is the text of agriculture theme It originally is that can carry out word segmentation processing after target text to target text and obtain corresponding target phrase, by being obtained in target phrase The word of agricultural product information.
S12: extracting the agricultural product information for including in target phrase is target agricultural product information, judges target agricultural product information In whether include agricultural product price, complete, and target agricultural product information be stored in database if it is, determining that data are extracted In, if it is not, then by searching the agricultural product information that there is relevance with target agricultural product information in database, based on what is found Agricultural product information and the relevance of target agricultural product information determine the corresponding agricultural product price of target agricultural product information, and will determine Agricultural product price out saves after target agricultural product information is added to database.
It can be target agricultural product information by the word for obtaining agricultural product information in target phrase after obtaining target phrase, It can directly judge in target agricultural product information whether to include agricultural product price at this time, if it is, explanation has been extracted Required agricultural product price data, namely successfully realize the extraction of agricultural product price data;Otherwise then explanation is extracted not successfully Required agricultural product price data then need to determine at this time the agricultural product price not extracted by certain technology, and will The agricultural product price saves after being filled into corresponding agricultural product information, for subsequent analysis.Wherein it is determined that the agricultural production not extracted Product price, can be has other certain associated being saved with the agricultural product price not extracted by searching in database Agricultural product price, and this relevance can be on place with relevance, on the time to have in relevance or type and have Relevant property etc., and then can be true based on the agricultural product price being saved and its with the relevance of undrawn agricultural product price The value for making undrawn agricultural product price realizes the filling of agricultural product price.
Technical solution disclosed in the present application is crawled to obtain corresponding text first by target webpage, determines agriculture theme Text obtains corresponding target phrase to carry out word segmentation processing to target text after target text, and then by extracting in target phrase Agricultural product information is target agricultural product information, directly by target agriculture if target agricultural product information includes corresponding agricultural product price Product information is saved to database, if lacking corresponding agricultural product price in target agricultural product information, is based in database What is be saved with target agricultural product information there are other agricultural product informations of relevance to determine that target agricultural product information is corresponding Agricultural product price, then the agricultural product price determined and target agricultural product information are stored to database, thus by this Mode effectively realizes the filling of the agricultural product price of missing, obtains more complete agricultural product information, efficiently solves existing skill The problem that the existing agricultural product price data extracted of art are imperfect, lack part is more.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention determines that target agricultural product information is corresponding Agricultural product price, may include:
The agricultural product price that there is ratio relation by searching agricultural product price corresponding with target agricultural product information in database Affiliated agricultural product information, and mesh is determined using the agricultural product price and corresponding ratio relation for including in the agricultural product information found Mark the corresponding agricultural product price of agricultural product information.
It should be noted that the ratio relation having between different agricultural product prices can be based on place, time or kind Class etc. determines.Specifically, when the ratio relation having between different agricultural product prices is determined based on place, same agricultural product exist There is ratio relation between the agricultural product price of different location, then can use the correspondence agricultural product price saved in database and multiply Target agricultural product information, which is obtained, with above-mentioned ratio relation corresponds to agricultural product price;The ratio having between different agricultural product prices closes When system is determined based on the time, same agricultural product have ratio relation between the agricultural product price of different time, then can use The correspondence agricultural product price saved in database obtains target agricultural product information multiplied by above-mentioned ratio relation and corresponds to agricultural product price; When the ratio relation having between different agricultural product prices is determined based on type, the agricultural product prices of different types of agricultural product it Between have ratio relation, then can use the correspondence agricultural product price saved in database and obtain target multiplied by above-mentioned ratio relation Agricultural product information corresponds to agricultural product price;Wherein, above-mentioned ratio relation can be staff and be based on being also possible to obtained by experience In advance to obtained have These characteristics (i.e. different types of agricultural product or same agricultural product correspond to different location, when Between) agricultural product price carry out data analysis gained, can be indicated with regular expression.It in this way can be compared with subject to The true value determining the data of missing and should having, realizes effective filling of data.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention carries out word segmentation processing to target text Corresponding target phrase is obtained, may include:
Target text is input in the participle model being pre-created, and determines that the result of participle model output is target word Group;Wherein, participle model is to advance with multiple samples of text and the corresponding phrase of each samples of text to BiLSTM-CRF mould What type was trained.
It should be noted that multiple samples of text and the corresponding phrase of each samples of text can be obtained in advance, thus sharp Obtain can be realized the participle model of word segmentation processing with this multiple samples of text and corresponding phrase training BiLSTM-CRF model.Its In, the text of samples of text is that be by its content topic crawled in website be agriculture theme, and each samples of text is corresponding Phrase is the phrase for obtain after word segmentation processing to samples of text;And BiLSTM-CRF model and counterparty in the prior art Case principle is identical, and structural schematic diagram can be as shown in Fig. 2, specifically the BiLSTM-CRF model be divided into three layers: (1) first Layer is look-up layers, is first word vector using one-hot DUAL PROBLEMS OF VECTOR MAPPING by each word;It is inputted before next layer again, it is settable Dropout layers prevent over-fitting;(2) second layer is BiLSTM- layers, using first layer sequence obtained as BiLSTM- The input of each time step, the layer include forward direction LSTM and LSTM layers reversed, by the output opsition dependent splicing of two layers, and then are obtained Obtain completely hidden status switch;It is then inputted into the dropout layer set, wherein including a linear layer, to obtain certainly The dynamic sentence characteristics extracted;(3) third layer is CRF layers, this layer is for sequence labelling, to achieve the effect that participle;CRF The parameter of layer is the matrix A of one (k+2) × (k+2), AijWhat is indicated is the transfer score from i-th of label to j-th of label, And then it can use the label marked before this when being labeled for a position.If one length of note is equal to Sequence label y=(the y of sentence length1, y2..., yn), then label (word segmentation result) beating equal to y of the model for sentence x It is divided into:
And then normalized probability is obtained using softmax,Entire model training mistake Journey, as long as maximizing its log-likelihood function.Whole process is exactly to pass through classification, to realize participle.As a result, in training After obtaining participle model, it is only necessary to which target text, which is input to participle model, can be obtained corresponding phrase, participle process operation Simply, quickly, and experiments have shown that the word segmentation processing accuracy realized of this mode is higher.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention determines in the text crawled Holding the text that theme is agriculture theme is target text, may include:
The text crawled is separately input into the subject classification model being pre-created, determines the subject classification mould The content topic of maximum probability is the content topic of corresponding text in the result of type output, and determines that content topic is agriculture theme Text be target text;Wherein, the subject classification model is to advance with multiple samples of text and each samples of text Content topic is trained LDA model.
It should be noted that multiple samples of text and the corresponding content topic of each samples of text can be obtained in advance, from And it obtains can be realized the determining subject classification mould of content topic using this multiple samples of text and content topic training LDA model Type.Wherein, the text of the samples of text agriculture theme that is that be by its content topic crawled in website be, target text is input to After subject classification model, the result of subject classification model output includes that target text may corresponding each content topic and correspondence The percentage of each content topic, and then determine that the highest content topic of percentage is the content topic of target text.It is logical This mode is crossed after training obtains subject classification model, it is only necessary to which target text, which is input to subject classification model i.e., can determine that The content topic of target text, it is easy to operate, quick, and it is experimentally confirmed the efficiency of this mode elder generation subject classification and accurate Property is higher.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention, determines target webpage, may include:
Based on needing the agricultural product title for the agricultural product information for carrying out data extraction to scan for, and determine obtained webpage In preceding preset quantity webpage be target webpage.
Wherein, preset quantity can be determined according to actual needs;Determine that the agricultural product information for needing to obtain is corresponding The agricultural product title of agricultural product, agricultural product title is input in search box, and multiple nets can be obtained in automatically clicking search button Page, and only using preceding preset quantity webpage as target webpage be more be possible to because of the forward webpage that searches out include The information for needing to obtain;In addition, the application is crawled by the data of specific website, it ensure that the data of acquisition are required data Possibility.
Furthermore it is also possible to be search is realized into as the keyword that scans in agricultural product title, time and place, and Search, which is obtained preceding preset quantity webpage in webpage, to be needed as target webpage so that searching for obtained result and meeting A possibility that asking is bigger.
A kind of extracting method of agricultural product price data provided in an embodiment of the present invention, extracts in the target phrase and includes Agricultural product information be target agricultural product information, may include:
Remove the stop-word and punctuation mark in the target phrase, obtains target agricultural product information.
When the agricultural product information for including in determining target phrase in the application, it is only necessary to by the punctuation mark in target phrase And stop-word removes, remaining in target phrase is the trunk portion of target text, such as time, place, main body, thus Can quick obtaining to target agricultural product information.
It, can be with it is further to note that in the application when getting the agricultural product price in target agricultural product price Judge whether agricultural product price is reasonable by artificial or other modes, if it is, determining mesh according to mode disclosed in the present application It marks corresponding agricultural product price in agricultural product information (above-described embodiment is disclosed to be determined according to ratio relation), and to determine Agricultural product price replaces the agricultural product price extracted, otherwise then without determining agricultural product price again, to pass through this side Formula realizes Information revision, ensure that the validity of agricultural product information.
The embodiment of the invention also provides a kind of extraction elements of agricultural product price data, as shown in figure 3, may include:
Module 11 is crawled, is used for: obtaining corresponding text by crawling in target webpage, determines that its content topic is main for agricultural The text of topic is target text, and carries out word segmentation processing to the target text and obtain corresponding target phrase;
Extraction module 12, is used for: extracting the agricultural product information for including in the target phrase is target agricultural product information, is sentenced Break whether comprising agricultural product price in the target agricultural product information, completed if it is, determining that data are extracted, and by the mesh Mark agricultural product information is stored in into database, if it is not, then being had by searching in the database with the target agricultural product information The agricultural product information of relevant property, and determined based on the relevance of the agricultural product information and the target agricultural product information that find The corresponding agricultural product price of the target agricultural product information, and the target agricultural product are added in the agricultural product price determined and are believed It saves after breath to the database.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, the extraction module may include:
Determination unit is used for: being had by searching agricultural product price corresponding with the target agricultural product information in the database There is the affiliated agricultural product information of the agricultural product price of ratio relation, and utilizes the agricultural product valence for including in the agricultural product information found Lattice and corresponding ratio relation determine the corresponding agricultural product price of the target agricultural product information.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, crawling module may include:
Participle unit is used for: the target text being input in the participle model being pre-created, and determines the participle The result of model output is target phrase;Wherein, the participle model is to advance with multiple samples of text and each text sample What this corresponding phrase was trained BiLSTM-CRF model.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, crawling module may include:
Theme determination unit, is used for: the text crawled is separately input into the subject classification model being pre-created, It determines that the content topic of maximum probability in the result of the subject classification model output is the content topic of corresponding text, and determines Content topic is that the text of agriculture theme is target text;Wherein, the subject classification model is to advance with multiple text samples What the content topic of this and each samples of text was trained LDA model.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, crawling module may include:
Webpage determination unit, is used for: based on needing the agricultural product title for the agricultural product information for carrying out data extraction to be searched Rope, and determine that preceding preset quantity webpage is target webpage in obtained webpage.
A kind of extraction element of agricultural product price data provided in an embodiment of the present invention, extraction module may include:
Extraction unit is used for: being removed the stop-word and punctuation mark in the target phrase, is obtained target agricultural product letter Breath.
The embodiment of the invention also provides a kind of extract equipments of agricultural product price data, may include:
Memory, for storing computer program;
Processor realizes a kind of agricultural product price data as described in weighing upper any one when for executing the computer program Extracting method the step of.
The embodiment of the invention also provides a kind of computer readable storage medium, deposited on the computer readable storage medium Computer program is contained, a kind of as above any one agricultural product valence may be implemented in the computer program when being executed by processor The step of extracting method of lattice data.
It should be noted that extraction element, equipment and the meter of a kind of agricultural product price data provided in an embodiment of the present invention The explanation of relevant portion refers to a kind of agricultural product price data provided in an embodiment of the present invention in calculation machine readable storage medium storing program for executing The detailed description of corresponding part in extracting method, details are not described herein.In addition above-mentioned technical proposal provided in an embodiment of the present invention In with correspond to the consistent part of technical solution realization principle and unspecified in the prior art, in order to avoid excessively repeat.
The foregoing description of the disclosed embodiments can be realized those skilled in the art or using the present invention.To this A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein can Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited It is formed on the embodiments shown herein, and is to fit to consistent with the principles and novel features disclosed in this article widest Range.

Claims (10)

1. a kind of extracting method of agricultural product price data characterized by comprising
By crawling to obtain corresponding text in target webpage, determine that its content topic be the text of agriculture theme is target text, And word segmentation processing is carried out to the target text and obtains corresponding target phrase;
Extracting the agricultural product information for including in the target phrase is target agricultural product information, judges the target agricultural product information In whether include agricultural product price, complete, and the target agricultural product information be stored in number if it is, determining that data are extracted According in library, if it is not, then by searching the agricultural product information that there is relevance with the target agricultural product information in the database, Determine that the target agricultural product information is corresponding with the relevance of the target agricultural product information based on the agricultural product information found Agricultural product price, and will the agricultural product price that determined the target agricultural product information is added after save to the database.
2. the method according to claim 1, wherein determining the corresponding agricultural product valence of the target agricultural product information Lattice, comprising:
The agricultural product that there is ratio relation by searching agricultural product price corresponding with the target agricultural product information in the database The affiliated agricultural product information of price, and it is true using the agricultural product price and corresponding ratio relation for including in the agricultural product information found Determine the corresponding agricultural product price of the target agricultural product information.
3. according to the method described in claim 2, it is characterized in that, being obtained to target text progress word segmentation processing corresponding Target phrase, comprising:
The target text is input in the participle model being pre-created, and determines that the result of the participle model output is mesh Mark phrase;Wherein, the participle model is to advance with multiple samples of text and the corresponding phrase pair of each samples of text BiLSTM-CRF model is trained.
4. according to the method described in claim 3, it is characterized in that, determining that content topic is main for agricultural in the text crawled The text of topic is target text, comprising:
The text crawled is separately input into the subject classification model being pre-created, determines that the subject classification model is defeated The content topic of maximum probability is the content topic of corresponding text in result out, and determines that content topic is the text of agriculture theme This is target text;Wherein, the subject classification model is the content for advancing with multiple samples of text and each samples of text Theme is trained LDA model.
5. according to the method described in claim 4, it is characterized in that, determining target webpage, comprising:
Based on needing the agricultural product title for the agricultural product information for carrying out data extraction to scan for, and before determining in obtained webpage Preset quantity webpage is target webpage.
6. according to the method described in claim 5, it is characterized in that, extracting the agricultural product information for including in the target phrase and being Target agricultural product information, comprising:
Remove the stop-word and punctuation mark in the target phrase, obtains target agricultural product information.
7. a kind of extraction element of agricultural product price data characterized by comprising
Module is crawled, is used for: obtaining corresponding text by crawling in target webpage, determines that its content topic is the text of agriculture theme This is target text, and carries out word segmentation processing to the target text and obtain corresponding target phrase;
Extraction module is used for: extracting the agricultural product information for including in the target phrase is target agricultural product information, described in judgement Whether include agricultural product price in target agricultural product information, is completed if it is, determining that data are extracted, and by the target agricultural production Product information is stored in into database, if it is not, then being associated with by searching to have with the target agricultural product information in the database Property agricultural product information, and the mesh is determined based on the relevance of the agricultural product information that finds and the target agricultural product information The corresponding agricultural product price of agricultural product information is marked, and is protected after the target agricultural product information is added in the agricultural product price determined It deposits to the database.
8. device according to claim 7, which is characterized in that the extraction module includes:
Determination unit is used for: by searched in the database corresponding with target agricultural product information agricultural product price with than The affiliated agricultural product information of the agricultural product price of value relationship, and using the agricultural product price for including in the agricultural product information that finds and Corresponding ratio relation determines the corresponding agricultural product price of the target agricultural product information.
9. a kind of extract equipment of agricultural product price data characterized by comprising
Memory, for storing computer program;
Processor realizes a kind of agricultural product price as described in any one of claim 1 to 6 when for executing the computer program The step of extracting method of data.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes a kind of agricultural product price number as described in any one of claim 1 to 6 when the computer program is executed by processor According to extracting method the step of.
CN201811543073.XA 2018-12-17 2018-12-17 A kind of extracting method, device and the equipment of agricultural product price data Pending CN109614538A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811543073.XA CN109614538A (en) 2018-12-17 2018-12-17 A kind of extracting method, device and the equipment of agricultural product price data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811543073.XA CN109614538A (en) 2018-12-17 2018-12-17 A kind of extracting method, device and the equipment of agricultural product price data

Publications (1)

Publication Number Publication Date
CN109614538A true CN109614538A (en) 2019-04-12

Family

ID=66009539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811543073.XA Pending CN109614538A (en) 2018-12-17 2018-12-17 A kind of extracting method, device and the equipment of agricultural product price data

Country Status (1)

Country Link
CN (1) CN109614538A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099133A1 (en) * 2009-10-28 2011-04-28 Industrial Technology Research Institute Systems and methods for capturing and managing collective social intelligence information
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
CN104516879A (en) * 2013-09-26 2015-04-15 Sap欧洲公司 Method and system for managing database containing record with missing value
CN105205099A (en) * 2015-08-20 2015-12-30 中国农业大学 Agricultural product price analysis method
CN106407464A (en) * 2016-10-12 2017-02-15 南京航空航天大学 KNN-based improved missing data filling algorithm
CN108664589A (en) * 2018-05-08 2018-10-16 苏州大学 Text message extracting method, device, system and medium based on domain-adaptive
CN108920461A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of polymorphic type and entity abstracting method and device containing complex relationship

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110099133A1 (en) * 2009-10-28 2011-04-28 Industrial Technology Research Institute Systems and methods for capturing and managing collective social intelligence information
CN104516879A (en) * 2013-09-26 2015-04-15 Sap欧洲公司 Method and system for managing database containing record with missing value
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
CN105205099A (en) * 2015-08-20 2015-12-30 中国农业大学 Agricultural product price analysis method
CN106407464A (en) * 2016-10-12 2017-02-15 南京航空航天大学 KNN-based improved missing data filling algorithm
CN108664589A (en) * 2018-05-08 2018-10-16 苏州大学 Text message extracting method, device, system and medium based on domain-adaptive
CN108920461A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of polymorphic type and entity abstracting method and device containing complex relationship

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
刘晓刚: "农产品大数据的抓取和分析方法探索", 《农村经济与科技》 *
刘晓刚: "农产品大数据的抓取和分析方法探索", 《农村经济与科技》, vol. 29, no. 19, 20 October 2018 (2018-10-20), pages 304 - 305 *
孟繁疆等: "农产品价格主题搜索引擎的研究与实现", 《东北农业大学学报》 *
孟繁疆等: "农产品价格主题搜索引擎的研究与实现", 《东北农业大学学报》, vol. 47, no. 09, 30 September 2016 (2016-09-30), pages 64 - 71 *
张伟等: "利用数据挖掘技术建设农业智能综合信息服务平台", 《农业网络信息》 *
张伟等: "利用数据挖掘技术建设农业智能综合信息服务平台", 《农业网络信息》, no. 08, 26 August 2011 (2011-08-26), pages 36 - 38 *
杨晓东等: "基于Hadoop平台的农产品价格数据爬取和存储系统的研究", 《计算机应用与软件》 *
杨晓东等: "基于Hadoop平台的农产品价格数据爬取和存储系统的研究", 《计算机应用与软件》, no. 03, 15 March 2017 (2017-03-15), pages 82 - 86 *
杨雄钢: "基于web的农产品市场价格分析与预测信息系统设计与实现", 《农家参谋》 *
杨雄钢: "基于web的农产品市场价格分析与预测信息系统设计与实现", 《农家参谋》, no. 17, 24 August 2018 (2018-08-24), pages 48 - 49 *
王文生等: "农业大数据及其应用展望", 《江苏农业科学》 *
王文生等: "农业大数据及其应用展望", 《江苏农业科学》, no. 09, 25 September 2015 (2015-09-25), pages 15 - 19 *

Similar Documents

Publication Publication Date Title
CN110502621A (en) Answering method, question and answer system, computer equipment and storage medium
CN103514299B (en) Information search method and device
CN104199833B (en) The clustering method and clustering apparatus of a kind of network search words
CN103544176B (en) Method and apparatus for generating the page structure template corresponding to multiple pages
CN104331449B (en) Query statement and determination method, device, terminal and the server of webpage similarity
CN110909164A (en) Text enhancement semantic classification method and system based on convolutional neural network
JP5421737B2 (en) Computer implementation method
CN109493265A (en) A kind of Policy Interpretation method and Policy Interpretation system based on deep learning
CN110347894A (en) Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
CN104102721A (en) Method and device for recommending information
CN110442841A (en) Identify method and device, the computer equipment, storage medium of resume
CN109918560A (en) A kind of answering method and device based on search engine
CN103886020B (en) A kind of real estate information method for fast searching
CN111325018B (en) Domain dictionary construction method based on web retrieval and new word discovery
CN107193892A (en) A kind of document subject matter determines method and device
Scheirer et al. The sense of a connection: Automatic tracing of intertextuality by meaning
Peters et al. Tag gardening for folksonomy enrichment and maintenance
CN109522275B (en) Label mining method based on user production content, electronic device and storage medium
CN109472022A (en) New word identification method and terminal device based on machine learning
CA2793570A1 (en) Systems and methods for research database management
CN106599305A (en) Crowdsourcing-based heterogeneous media semantic meaning fusion method
CN109614538A (en) A kind of extracting method, device and the equipment of agricultural product price data
CN110390037A (en) Information classification approach, device, equipment and storage medium based on dom tree
CN112989811B (en) History book reading auxiliary system based on BiLSTM-CRF and control method thereof
CN115640439A (en) Method, system and storage medium for network public opinion monitoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination