CN105205099B - A kind of agricultural product price analysis method - Google Patents
A kind of agricultural product price analysis method Download PDFInfo
- Publication number
- CN105205099B CN105205099B CN201510516065.6A CN201510516065A CN105205099B CN 105205099 B CN105205099 B CN 105205099B CN 201510516065 A CN201510516065 A CN 201510516065A CN 105205099 B CN105205099 B CN 105205099B
- Authority
- CN
- China
- Prior art keywords
- agricultural product
- information
- parallel construction
- default
- producing region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000005516 engineering process Methods 0.000 claims abstract description 10
- 238000010276 construction Methods 0.000 claims description 101
- 238000004519 manufacturing process Methods 0.000 claims description 33
- 238000000034 method Methods 0.000 claims description 19
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 238000012271 agricultural production Methods 0.000 claims description 11
- 238000013480 data collection Methods 0.000 claims description 7
- 230000003068 static effect Effects 0.000 claims description 7
- 230000009193 crawling Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- BUGBHKTXTAQXES-UHFFFAOYSA-N Selenium Chemical compound [Se] BUGBHKTXTAQXES-UHFFFAOYSA-N 0.000 claims description 5
- 229910052711 selenium Inorganic materials 0.000 claims description 5
- 239000011669 selenium Substances 0.000 claims description 5
- 238000012706 support-vector machine Methods 0.000 claims 2
- 238000007635 classification algorithm Methods 0.000 claims 1
- 238000009826 distribution Methods 0.000 abstract description 3
- 235000013399 edible fruits Nutrition 0.000 description 39
- 235000010724 Wisteria floribunda Nutrition 0.000 description 11
- 239000000284 extract Substances 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 241001075517 Abelmoschus Species 0.000 description 2
- 241000219109 Citrullus Species 0.000 description 2
- 235000012828 Citrullus lanatus var citroides Nutrition 0.000 description 2
- 241000819999 Nymphes Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 235000011389 fruit/vegetable juice Nutrition 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- NTMYVTSWQJFCPA-UHFFFAOYSA-N (2-tert-butylpyrimidin-5-yl)oxy-ethoxy-propan-2-yloxy-sulfanylidene-$l^{5}-phosphane;[cyano-(4-fluoro-3-phenoxyphenyl)methyl] 3-(2,2-dichloroethenyl)-2,2-dimethylcyclopropane-1-carboxylate Chemical compound CCOP(=S)(OC(C)C)OC1=CN=C(C(C)(C)C)N=C1.CC1(C)C(C=C(Cl)Cl)C1C(=O)OC(C#N)C1=CC=C(F)C(OC=2C=CC=CC=2)=C1 NTMYVTSWQJFCPA-UHFFFAOYSA-N 0.000 description 1
- 241000692770 Taraxia ovata Species 0.000 description 1
- JLQUFIHWVLZVTJ-UHFFFAOYSA-N carbosulfan Chemical compound CCCCN(CCCC)SN(C)C(=O)OC1=CC=CC2=C1OC(C)(C)C2 JLQUFIHWVLZVTJ-UHFFFAOYSA-N 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/02—Agriculture; Fishing; Forestry; Mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Mining & Mineral Resources (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Marine Sciences & Fisheries (AREA)
- Animal Husbandry (AREA)
- Agronomy & Crop Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of agricultural product price analysis methods, include the following steps:Using one assembled classifier of preset search engine training, agricultural product kind information is obtained according to the assembled classifier;The geographical location information of the supplier of each agricultural product kind and the pricing information of agricultural product are crawled from default commodities trading website;It is divided according to the producing region that the geographical location information of the supplier of the agricultural product kind information and each agricultural product kind carries out agricultural product, obtains the producing region information of the agricultural product of each kind;According to the producing region information of the agricultural product of each kind and the pricing information of agricultural product, shown using the compartmentalization that Distribution GIS technology carries out the agricultural product price of the kind.The present invention can provide enough, the higher information of accuracy for the business decision of agricultural product.
Description
Technical field
The present invention relates to agricultural product price analysis technical field more particularly to a kind of agricultural product price analysis methods.
Background technique
Market for farm products market are the important components of national economy, politics and social stability, reinforce agricultural product price
Information analysis, the difference condition between the situation of change and area and area of agricultural product price is obtained, for stablizing agricultural production
Product market conditions, providing science to government department, agricultural product whole seller and agricultural producer etc., accurately decision information has weight
Want meaning.Government department according to the variation and area differentiation situation of agricultural product price come macro adjustments and controls appropriate, preferably to advise
Layout of agricultural production is drawn, the structure of agricultural production is adjusted, farm produce sale is organized, reaches the balance between supply and demand between region, avoid agricultural product
The huge fluctuation of price, stabilization of maintaining market, to efficiently solve rural economy, rural development and rural demography;Agricultural product whole seller is according to agricultural product price
Fluctuation adjust operation sales tactics, obtain more golden eggs;Agricultural producer is appropriate to change agricultural production kind according to supply and demand situation
Kind is planted, avoids unsalable, influences to take in.
With the increasingly intensification and the rapid development of internet of socialist market economy system reform, agricultural product price
It is increasingly influenced by market management situation and circulation environment, agricultural product network trading becomes increasingly prevalent, agricultural product
Transaction data also sharp increase.For agricultural product, price often significantly becomes with its kind, the place of production and selling spot
Change, how sufficiently to excavate these data and obtain the relationship between three have become research hotspot.
Currently, China has existed many networks quotation platforms, but its there are the following problems:
First, without the difference in kind.It, will not be specific to for example, quotation platform often can only provide the price of watermelon
The price of each variety of watermelon;
Second, without the difference on region.For example, quotation platform tends not to provide the place of production of agricultural product.
These data can not provide enough information all for business decision.
Summary of the invention
The technical problem to be solved by the present invention is to solve existing quotation platform not combining the specific kind of agriculture product and the place of production
The problem of offering, enough business decision information cannot be provided.
For this purpose, including the following steps the invention proposes a kind of agricultural product price analysis method:
Using one assembled classifier of preset search engine training, agricultural product kind letter is obtained according to the assembled classifier
Breath;
The geographical location information and agricultural production of the supplier of each agricultural product kind are crawled from default commodities trading website
The pricing information of product;
It is carried out according to the geographical location information of the supplier of the agricultural product kind information and each agricultural product kind
The producing region of agricultural product divides, and obtains the producing region information of the agricultural product of each kind;
According to the producing region information of the agricultural product of each kind and the pricing information of agricultural product, geography information system is utilized
The compartmentalization for the agricultural product price that system GIS technology carries out the kind is shown.
Preferably, described to utilize one assembled classifier of preset search engine training, it is obtained according to the assembled classifier
Agricultural product kind information, specifically includes:
Text corresponding with search term on preset search engine is crawled, the expression structure arranged side by side in the text is extracted;Its
In, the expression structure arranged side by side is the text separated with pause mark;
The parallel construction and non-default type of the agricultural product kind of default type are selected from the expression structure arranged side by side
The parallel construction of agricultural product kind is as training sample data collection;
The front and back text of agricultural product kind parallel construction of the default type is extracted as characteristic information;
According to the characteristic information, using the multiple base classifiers of support vector cassification algorithm training, by the multiple base
Classifier is combined, and constitutes the assembled classifier;
The agricultural product kind information is obtained using the assembled classifier.
Preferably, the expression structure arranged side by side for crawling corresponding text on preset search engine, extracting in the text,
It specifically includes:
The basic kind thesaurus for constructing the agricultural product of a default type, using the word in the thesaurus as seed
Word;The thesaurus includes the basic kind of the agricultural product of the default type of preset quantity;
The corresponding entry of basic kind word of the agricultural product in the thesaurus is downloaded by the preset search engine,
Extract all text informations in the entry;
All expression structures arranged side by side in the text information are extracted using regular expression.
Preferably, the parallel construction of the agricultural product kind that default type is selected from the expression structure arranged side by side and non-
The parallel construction of the agricultural product kind of default type is specifically included as training sample data collection:
According to default marking rule is to each of parallel construction and list is given a mark;
Each and list obatained score mean value in each parallel construction is calculated, using the mean value as the parallel construction
Score;
The score of each parallel construction is compared with preset first threshold, when the score of the parallel construction reaches
When the first threshold, then the parallel construction is the parallel construction of the agricultural product kind of default type, is non-default kind otherwise
The parallel construction of the agricultural product kind of class;
Wherein, the default marking rule includes:
It is scanned in the preset search engine with preset search format, if described and list is in the thesaurus
Occur, then determines that described and list obtains 1 point;
It is scanned in the preset search engine with preset search format, if comprising described arranged side by side in result entry
, then determine that described and list obtains 0.8 point;
It is scanned in the preset search engine with the preset search format, if preset search format and result word
The numerical value of the mutual information of item reaches preset second threshold and then determines that described and list obtains 0.5 point;
The preset search format is:Described and list+space+default type agricultural product variety classification.
Preferably, the front and back text of the agricultural product kind parallel construction for extracting the default type is believed as feature
Breath, specifically includes:
Sentence removes the agricultural product kind and ties side by side where extracting the agricultural product kind parallel construction of the default type
Remaining text except structure, using word therein as the first subcharacter information;
The previous sentence of sentence where extracting the agricultural product kind parallel construction of the default type and latter sentence are corresponding
Text, using word therein as the second subcharacter information;
Using the first subcharacter information and the second subcharacter information as the characteristic information.
Preferably, described according to the characteristic information, multiple base classifiers are trained using support vector cassification algorithm, it will
The multiple base classifier is combined, and is constituted the assembled classifier, is specifically included:
The parallel construction of the agricultural product kind for the default type training sample data concentrated by pre-set criteria
It is divided into N parts and L parts respectively with the parallel construction of the agricultural product kind of the non-default type;
Randomly select the agricultural product kind of the N-1 parts therein default type parallel construction and K parts it is described non-default
The parallel construction of the agricultural product kind of type as training sample, the agricultural product kind of remaining 1 part of default type and
The parallel construction of the agricultural product kind of array structure and the L-K parts of non-default types as test sample, by it is described support to
Amount machine sorting algorithm is learnt, and a base classifier is obtained;
It repeats previous step M times, then obtains M base classifier;
The M base classifier is combined, the assembled classifier is obtained.
Preferably, described to obtain the agricultural product kind information using the assembled classifier, it specifically includes:
Judge whether the parallel construction in expression structure arranged side by side to be measured is the default type using the assembled classifier
Agricultural product kind parallel construction;If so, the corresponding agricultural product kind of the parallel construction is added to corresponding agricultural production
Category not in, to obtain the agricultural product kind information.
Preferably, the geographical location letter of the supplier that each agricultural product kind is crawled from default commodities trading website
The pricing information of breath and agricultural product, specifically includes:
The agricultural product of static Web page are supplied on the default commodities trading website using static URL crawling method
The geographical location information of quotient is answered to crawl;
Or,
The agricultural product of dynamic web page are supplied on the default commodities trading website using Selenium tool
The geographical location information of quotient crawls.
Preferably, in the geography of the supplier according to the agricultural product kind information and each agricultural product kind
The producing region that location information carries out agricultural product divides, and before obtaining the producing region information of the agricultural product of each kind, the method is also wrapped
It includes:
Obtain the geography information in china administration region, wherein the geography information is divided into 4 grades:The first order is to save or be directly under the jurisdiction of
City, the second level are prefecture-level city, the third level is area or county-level city, the fourth stage are street or small towns.
Preferably, believed according to the geographical location of the supplier of the agricultural product kind information and each agricultural product kind
The producing region that breath carries out agricultural product divides, and obtains the producing region information of the agricultural product of each kind, specifically includes:
According to the level order in the geography information, the geographical location information of the agricultural product supplier is carried out consistent
Property processing;
For the agricultural product kind information, existed according to each place of production in each agricultural product kind and corresponding multiple places of production
The number occurred in the web advertisement calculates the agricultural product kind in the producing region weight in each place of production;
By the place of production of the producing region maximum weight and in the place of production preset range and producing region weight is more than preset
The place of production of third threshold value is as the first main producing region;
Multiple main producing regions of preset quantity are successively determined in other places of production in addition to first main producing region.
It, can be by specific agricultural product kind and agricultural production by using agricultural product price analysis method disclosed in this invention
Price difference of the product in different zones combines, and realizes every kind of agricultural product in the flat fare in each producing region using GIS technology
Lattice are shown, provide enough, the higher information of accuracy for business decision.
Detailed description of the invention
The features and advantages of the present invention will be more clearly understood by referring to the accompanying drawings, and attached drawing is schematically without that should manage
Solution is carries out any restrictions to the present invention, in the accompanying drawings:
Fig. 1 shows agricultural product price analysis method flow chart provided in an embodiment of the present invention;
Fig. 2 shows utilize preset search engine training assembled classifier in the embodiment of the present invention and obtain agricultural product kind
The flow chart of information;
Fig. 3, which is shown, utilizes the multiple base classifiers of support vector cassification algorithm training and composition group in the embodiment of the present invention
Close the flow chart of classifier;
Fig. 4 is shown in the embodiment of the present invention according to the confession of the agricultural product kind information and each agricultural product kind
The producing region for answering the geographical location information of quotient to carry out agricultural product divides, and obtains the producing region information flow chart of the agricultural product of each kind;
Fig. 5 shows red fuji apple in the embodiment of the present invention in the price situation schematic diagram in the whole nation.
Specific embodiment
Below in conjunction with attached drawing, embodiments of the present invention is described in detail.
Fig. 1 shows agricultural product price analysis method flow chart provided in an embodiment of the present invention.As shown in Figure 1, this implementation
The agricultural product price analysis method of example, includes the following steps:
A1:Using one assembled classifier of preset search engine training, agricultural product product are obtained according to the assembled classifier
Kind information;
A2:Geographical location information and the agriculture of the supplier of each agricultural product kind are crawled from default commodities trading website
The pricing information of product;
A3:According to the geographical location information of the supplier of the agricultural product kind information and each agricultural product kind into
The producing region of row agricultural product divides, and obtains the producing region information of the agricultural product of each kind;
A4:According to the producing region information of the agricultural product of each kind and the pricing information of agricultural product, believed using geography
The compartmentalization for the agricultural product price that breath system GIS technology carries out the kind is shown.
Agricultural product price analysis method provided by the present invention is by specific agricultural product kind and agricultural product in different zones
On difference combine, realize that average price of the every kind of agricultural product in each producing region is shown using GIS technology, be business decision
Enough, the higher information of accuracy is provided.
Fig. 2 shows utilize preset search engine training assembled classifier in the embodiment of the present invention and obtain agricultural product kind
The flow chart of information.As shown in Fig. 2, one assembled classifier of preset search engine training is utilized described in the present embodiment, according to
The step of assembled classifier acquisition agricultural product kind information, specifically includes:
B1:Text corresponding with search term on preset search engine is crawled, the expression structure arranged side by side in the text is extracted;
Wherein, the expression structure arranged side by side is the text separated with pause mark;
B2:The parallel construction and non-default type of the agricultural product kind of default type are selected from the expression structure arranged side by side
Agricultural product kind parallel construction as training sample data collection;
B3:The front and back text of agricultural product kind parallel construction of the default type is extracted as characteristic information;
B4:It will be the multiple using the multiple base classifiers of support vector cassification algorithm training according to the characteristic information
Base classifier is combined, and constitutes the assembled classifier;
B5:The agricultural product kind information is obtained using the assembled classifier.
Further, described to crawl corresponding text on preset search engine, extract the expression arranged side by side knot in the text
Structure specifically includes:
S1:Construct a default type agricultural product basic kind thesaurus, using the word in the thesaurus as
Seed words;The thesaurus includes the basic kind of agricultural product of the default type of preset quantity;
S2:The corresponding word of basic kind word of the agricultural product in the thesaurus is downloaded by the preset search engine
Item extracts all text informations in the entry;
S3:All expression structures arranged side by side in the text information are extracted using regular expression.
Further, the parallel construction of the agricultural product kind that default type is selected from the expression structure arranged side by side and
The parallel construction of the agricultural product kind of non-default type is specifically included as training sample data collection:
According to default marking rule is to each of parallel construction and list is given a mark;
Each and list obatained score mean value in each parallel construction is calculated, using the mean value as the parallel construction
Score;
The score of each parallel construction is compared with preset first threshold, when the score of the parallel construction reaches
When the threshold value, then the parallel construction is the parallel construction of the agricultural product kind of default type, is the agriculture of non-default type otherwise
The parallel construction of product variety;
Wherein, the default marking rule includes:
It is scanned in the preset search engine with preset search format, if described and list is in the thesaurus
Occur, then determines that described and list obtains 1 point;
It is scanned in the preset search engine with preset search format, if comprising described arranged side by side in result entry
, then determine that described and list obtains 0.8 point;
It is scanned in the preset search engine with the preset search format, if preset search format and result word
The numerical value of the mutual information of item reaches preset second threshold and then determines that described and list obtains 0.5 point;
The preset search format is:Described and list+space+default type agricultural product variety classification.
Preferably, the front and back text of the agricultural product kind parallel construction for extracting the default type is as characteristic information
It specifically includes:
Sentence removes the agricultural product kind and ties side by side where extracting the agricultural product kind parallel construction of the default type
Remaining text except structure, using word therein as the first subcharacter information;
The previous sentence of sentence where extracting the agricultural product kind parallel construction of the default type and latter sentence are corresponding
Text, using word therein as the second subcharacter information;
Using the first subcharacter information and the second subcharacter information as the characteristic information.
Fig. 3, which is shown, utilizes the multiple base classifiers of support vector cassification algorithm training and composition group in the embodiment of the present invention
Close the flow chart of classifier.As shown in figure 3, it is described according to the characteristic information, it is more using the training of support vector cassification algorithm
The multiple base classifier is combined by a base classifier, is constituted the assembled classifier and is specifically included following steps:
S01:The agricultural product kind for the default type training sample data concentrated by pre-set criteria it is arranged side by side
The parallel construction of the agricultural product kind of structure and the non-default type is divided into N parts and L parts respectively;
S02:Randomly select the agricultural product kind of the N-1 parts therein default type parallel construction and K parts it is described non-
The parallel construction of the agricultural product kind of default type is as training sample, the agricultural product kind of remaining 1 part of default type
Agricultural product kind with the L-K parts of non-default types is carried out as test sample by the support vector cassification algorithm
Study, obtains a base classifier;
S03:It repeats step S02M times, then obtains M base classifier;
S04:The M base classifier is combined, the assembled classifier is obtained.
Further, described to be specifically included using the assembled classifier acquisition agricultural product kind information:
Judge whether the parallel construction in expression structure arranged side by side to be measured is the default type using the assembled classifier
Agricultural product kind parallel construction;If so, the corresponding agricultural product kind of the parallel construction is added to corresponding agricultural production
Category not in, to obtain the agricultural product kind information.
Preferably, the geographical location letter of the supplier that each agricultural product kind is crawled from default commodities trading website
The pricing information of breath and agricultural product specifically includes:
The agricultural product of static Web page are supplied on the default commodities trading website using static URL crawling method
The geographical location information of quotient is answered to crawl;
Or,
The agricultural product of dynamic web page are supplied on the default commodities trading website using Selenium tool
The geographical location information of quotient crawls.
Wherein, Selenium tool is a tool for web application test.
As the preferred of the present embodiment, described according to the agricultural product kind information and each agricultural product kind
The producing region that the geographical location information of supplier carries out agricultural product divides, before obtaining the producing region information of the agricultural product of each kind,
The method also includes:
Obtain the geography information in china administration region, wherein the geography information is divided into 4 grades:The first order is to save or be directly under the jurisdiction of
City, the second level are prefecture-level city, the third level is area or county-level city, the fourth stage are street or small towns.
Fig. 4 is shown in the embodiment of the present invention according to the confession of the agricultural product kind information and each agricultural product kind
The producing region for answering the geographical location information of quotient to carry out agricultural product divides, and obtains the producing region information flow chart of the agricultural product of each kind;
As shown in figure 4, the geographical location of the supplier of the agricultural product kind information and each agricultural product kind is believed
The producing region that breath carries out agricultural product divides, and the producing region information for obtaining the agricultural product of each kind specifically includes:
SS1:According to the level order in the geography information, the geographical location information of the agricultural product supplier is carried out
Consistency treatment;
SS2:For the agricultural product kind information, according to each production in each agricultural product kind and corresponding multiple places of production
The number that ground occurs in the web advertisement calculates the agricultural product kind in the producing region weight in each place of production;
SS3:By the place of production of the producing region maximum weight and in the place of production preset range and producing region weight is more than pre-
If third threshold value the place of production as the first main producing region;
SS4:Multiple main producing regions of preset quantity are successively determined in other places of production in addition to first main producing region.
Below using fruit as default kind of agricultural products, agricultural product price analysis side provided by the present embodiment is described in detail
The specific implementation process of method:
Step 1, it using the encyclopaedia entry of preset search engine (such as Baidupedia) downloading fruit, and crawls it and corresponds to text
The expression structure arranged side by side separated in this with pause mark, selects wherein fruit variety parallel construction and non-fruit variety using voting mechanism
Training sample data collection of the parallel construction as Study strategies and methods.
Steps are as follows for the acquisition methods of above-mentioned training sample data collection:
(1) the basic kind thesaurus for rule of thumb constructing a fruit first, using the word in table as follow-up work
The seed words of starting include most common 15 to the 30 kinds of fruit varieties of every kind of fruit in the thesaurus, such as the seed of apple is chatted
Vocabulary is as shown in table 1:
The seed thesaurus of 1 apple of table
Fuji apple | Loud, high-pitched sound |
Hua Niu | A Suke |
Red marshal | Qiao Najin |
Jin Guan | Luochuan |
State's light | Wang Lin |
Qin Guan | Red Star |
State's light | Red general |
(2) the corresponding Baidu's entry of these fruit varieties is downloaded by Baidupedia, and extracts all texts in entry
Information.
(3) all expression structures arranged side by side in above-mentioned text information are extracted using regular expression, which refers to
The word that all pause marks separate, such as in short " adult and nymph suck the juice of okra fruit, tender leaf, tender tip, after fruit is killed,
Emptying by harm pulp in scraggly malformed fruit, suberification " is then matched using regular expression and is extracted
It as a result is " juice that adult and nymph suck okra fruit, tender leaf, tender tip ".
(4) according to default marking rule is to each of parallel construction and list is given a mark;Calculate each parallel construction
In each and list obatained score mean value, using the mean value as the score of the parallel construction;By each parallel construction
Score is compared with preset first threshold, when the score of the parallel construction reaches the first threshold, then it is described simultaneously
Array structure is the parallel construction of the agricultural product kind of default type, is the knot arranged side by side of the agricultural product kind of non-default type otherwise
Structure;
Wherein, the default marking rule includes:
The default marking rule is as follows:
R1:It is scanned in the preset search engine with preset search format, if described and list is in the descriptor
Occur in table, then determines that described and list obtains 1 point;
R2:It is scanned in the preset search engine with preset search format, if in result entry comprising described in simultaneously
List then determines that described and list obtains 0.8 point;Wherein, the preset search format is:Described and list+space+default type
Agricultural product variety classification;
In the present embodiment, it is searched in Baidupedia with " and list "+space+" fruit classification " search format
Rope, such as:For simultaneously list " Fuji apple ", i.e., " red fuji apple " is scanned in Baidupedia entry, if gained knot
Fruit is " red fuji apple ", due to containing simultaneously list " Fuji apple " in the result, then simultaneously list " Fuji apple " can should obtain 0.8
Point.
R3:It is scanned in the preset search engine with the preset search format, if preset search format and knot
The numerical value of the mutual information of fruit entry reaches preset second threshold and then determines that described and list obtains 0.5 point.
Step 2, characteristic information of the front and back text of fruit variety parallel construction as training base classifier is extracted, it is specific to wrap
It includes:
Sentence where extracting fruit variety parallel construction removes the remaining text except the fruit variety parallel construction, with it
In word as the first subcharacter information;The previous sentence and latter sentence pair of sentence where extracting fruit variety parallel construction
The text answered, using word therein as the second subcharacter information;By the first subcharacter information and second subcharacter
Information is as the characteristic information.
Text corresponding to features described above information is segmented, goes the pretreatments behaviour such as stop words and feature vector expression
Make.
Step 3, a base classifier is constructed using support vector cassification algorithm, then repeats the experiment repeatedly to generate
These base classifiers are combined by multiple base classifiers, constitute assembled classifier, the output of the assembled classifier is these
The output of base classifier vote as a result, specific steps include:
S01:The parallel construction for the fruit variety training sample data concentrated by pre-set criteria by pre-set criteria with
Stochastic averagina is divided into N parts and L parts (pre-set criteria is, for example, the parallel construction of non-fruit variety respectively:So that every fruits product
Type parallel construction and non-aqueous fruit type parallel construction generally remain 1 to 1 ratio);
S02:The parallel construction of the parallel construction and K parts of non-fruit varieties of randomly selecting N-1 fruits kind therein is made
For training sample, the parallel construction of remaining 1 fruits kind and the parallel construction of L-K parts of non-fruit varieties are as test specimens
This, is learnt by support vector cassification algorithm, can generate a base classifier;
S03:It repeats step S02M times, then may make up M base classifier altogether.
S04:The M base classifier is combined, assembled classifier is obtained.
For example, being directed to the parallel construction sample of 520 fruit variety classes and the parallel construction of 5500 non-aqueous fruit types
Fruit variety class parallel construction is randomly divided into 10 parts every time by sample, and non-aqueous fruit type parallel construction is randomly divided into 100
Part, it takes out 9 parts at random from fruit variety class parallel construction, takes out 9 parts in non-aqueous fruit type parallel construction, composing training sample
This, i.e., so that the total amount of the fruit variety class parallel construction and non-aqueous fruit type parallel construction taken out generally remains 1 to 1 ratio
Example;By remaining 1 fruits kind class parallel construction and 91 parts of non-aqueous fruit type parallel constructions, test sample is constituted.
It is primary according to the training of support vector cassification algorithm, when a base classifier can be obtained.Construct assembled classifier
When, according to the above method training repeatedly, can be obtained by different multiple base classifiers combinations at assembled classifier.
Be given below the present embodiment by 51 base classifiers be combined composition assembled classifier as a result, the present embodiment into
Many experiments are gone, so that the output of assembled classifier is stablized, and using above-mentioned marking rule to resulting assembled classifier
Export result Average Accuracy carry out analytical calculation, specific calculating process herein without repeating, gained assembled classifier
Average Accuracy is as shown in table 2.
The Average Accuracy of 2 assembled classifier of table
Classifier number | For the first time | Second | For the third time | 4th time | 5th time | Average Accuracy |
1 | 68.90% | 65.30% | 69.70% | 62.30% | 69.20% | 66.72% |
5 | 69.40% | 73.50% | 71.70% | 72.80% | 70.20% | 71.52% |
11 | 74.80% | 76.00% | 73.20% | 76.10% | 73.90% | 74.80% |
31 | 77.80% | 76.50% | 78.60% | 77.40% | 77.30% | 77.52% |
51 | 78.60% | 78.40% | 78.40% | 78.70% | 77.20% | 78.28% |
101 | 78.10% | 78.20% | 79.70% | 78.80% | 78.90% | 78.74% |
Step 4, the ground of agricultural product supplier is crawled from default commodities trading website (such as the wholesale net of Alibaba)
Manage location information (for example, fruit source area, fruit retailer address) and agricultural product pricing information.Due to Arriba
Bar batch hairnet uses the technology of dynamic web page, directly directly can not crawl geographical location information by static state URL crawling method,
Therefore dynamic web page is handled using Selenium tool, pseudocode provided in this embodiment is as follows:
Step 5, the geography information of china administration region (cities and counties of province) is obtained using Internet resources.Specifically, from network
The geography information for crawling all cities and counties of province of China, is stored using JSON data format, which is divided into 4 grades, the
Level-one is province or municipality directly under the Central Government, the second level are prefecture-level city, the third level is area or county-level city, the fourth stage are street or small towns.
Step 6, believed according to the geographical location of the supplier of the agricultural product kind information and each agricultural product kind
The producing region that breath carries out agricultural product divides, and obtains the producing region information of the agricultural product of each kind, specifically includes:
(1) according to level order (province or municipality directly under the Central Government, prefecture-level city, area or the county-level city, street or township in above-mentioned geography information
Town), consistency treatment carried out to the geographical location information of the agricultural product supplier, such as by the geographical position of agricultural product supplier
It sets and is completely expressed as in " XX province, the city XX, the county XX ", that is, the geographical location information statement that can avoid supplier is lack of standardization, such as
The problems such as province, city, county's information are write a Chinese character in simplified form or lacked to name.
(2) for each of agricultural product kind information agricultural product kind, according to each agricultural product kind and corresponding
The number that each place of production occurs in the web advertisement in multiple places of production, calculates the agricultural product kind in the producing region weight in each place of production,
Calculation method is as follows:
Agricultural product kind and the two information of the place of production often occur once, then the producing region weight being enabled to add in the first web advertisement
1。
(3) its main producing region is found for each agricultural product kind in the agricultural product kind information, steps are as follows:
By the place of production of the producing region maximum weight and (such as 250 kilometers) and producing region in the place of production preset range
Weight is more than the place of production of preset third threshold value (such as 2) as the first main producing region;
(4) present count is successively determined according to producing region weight and distance in other places of production in addition to first main producing region
Multiple main producing regions (such as second main producing region, third main producing region etc.) of amount, the master of every kind of agricultural product provided in this embodiment
Producing region is as shown in table 3.
The main producing region of the every kind of agricultural product of table 3
Step 7, agricultural product average price is carried out in conjunction with the agricultural product kind information and agricultural product producing region information
Analysis, each kind price situation of each Regional Agricultural Products of gained are as shown in table 4.
Each kind price situation of each Regional Agricultural Products of table 4
By in such as table 4 it is found that the price of different zones agricultural product of the same race also can be different.For example, in apple class, Fuji apple
Price in the first main producing region will be higher than the price in the second main producing region, and the price of loud, high-pitched sound also the first main producing region is higher than
The price in region.
Fig. 5 shows red fuji apple in the embodiment of the present invention in the price situation schematic diagram in the whole nation.As shown in figure 5, this
Compartmentalization is carried out using agricultural product price of the Distribution GIS technology to each kind in embodiment to show, from figure
The shade in area open-and-shut can know red fuji apple in the price distribution situation in the whole nation, the i.e. deeper ground of color
The price in area, red fuji apple is higher.
Agricultural product price analysis method provided by the present embodiment, by specific agricultural product kind and agricultural product in not same district
Price difference on domain combines, and realizes that average price of the every kind of agricultural product in each producing region is shown using GIS technology, is quotient
Industry decision provides enough, the higher information of accuracy.
The undocumented technology contents of the present embodiment belong to ordinary skill common sense, and above embodiments are only to illustrate
Technical solution of the present invention, rather than its limitations;Although the present invention is described in detail referring to the foregoing embodiments, ability
The those of ordinary skill in domain should understand that:It is still possible to modify the technical solutions described in the foregoing embodiments, or
Person's equivalent replacement of some of the technical features;And these are modified or replaceed, and do not make the essence of corresponding technical solution
It departs from the spirit and scope of the technical scheme of various embodiments of the present invention.
Claims (8)
1. a kind of agricultural product price analysis method, which is characterized in that include the following steps:
Using one assembled classifier of preset search engine training, agricultural product kind information is obtained according to the assembled classifier;
The geographical location information and agricultural product of the supplier of each agricultural product kind are crawled from default commodities trading website
Pricing information;
Agricultural production is carried out according to the geographical location information of the supplier of the agricultural product kind information and each agricultural product kind
The producing region of product divides, and obtains the producing region information of the agricultural product of each kind;
According to the producing region information of the agricultural product of each kind and the pricing information of agricultural product, GIS-Geographic Information System is utilized
The compartmentalization that GIS technology carries out the agricultural product price of the kind is shown;
It is wherein described to utilize one assembled classifier of preset search engine training, agricultural product product are obtained according to the assembled classifier
Kind information, specifically includes:
Text corresponding with search term on preset search engine is crawled, the expression structure arranged side by side in the text is extracted;Wherein, institute
Stating expression structure arranged side by side is the text separated with pause mark;
The parallel construction of the agricultural product kind of default type and the agricultural production of non-default type are selected from the expression structure arranged side by side
The parallel construction of product kind is as training sample data collection;
The front and back text of agricultural product kind parallel construction of the default type is extracted as characteristic information;
The multiple base is classified using the multiple base classifiers of support vector cassification algorithm training according to the characteristic information
Device is combined, and constitutes the assembled classifier;
The agricultural product kind information is obtained using the assembled classifier;
It is wherein described to crawl corresponding text on preset search engine, the expression structure arranged side by side in the text is extracted, it is specific to wrap
It includes:
The basic kind thesaurus for constructing the agricultural product of a default type, using the word in the thesaurus as seed words;
The thesaurus includes the basic kind of the agricultural product of the default type of preset quantity;
The corresponding entry of basic kind word of the agricultural product in the thesaurus is downloaded by the preset search engine, is extracted
All text informations in the entry;
All expression structures arranged side by side in the text information are extracted using regular expression.
2. agricultural product price analysis method as described in claim 1, which is characterized in that described from the expression structure arranged side by side
The parallel construction of the parallel construction of the agricultural product kind of default type and the agricultural product kind of non-default type is selected as training
Sample data set specifically includes:
According to default marking rule is to each of parallel construction and list is given a mark;
Each and list obatained score mean value in each parallel construction is calculated, using the mean value as point of the parallel construction
Number;
The score of each parallel construction is compared with preset first threshold, when the score of the parallel construction reaches described
When first threshold, then the parallel construction is the parallel construction of the agricultural product kind of default type, is non-default type otherwise
The parallel construction of agricultural product kind;
Wherein, the default marking rule includes:
It is scanned in the preset search engine with preset search format, if described and list goes out in the thesaurus
It is existing, then determine that described and list obtains 1 point;
It is scanned in the preset search engine with preset search format, if in result entry including described and list,
Determine that described and list obtains 0.8 point;
It is scanned in the preset search engine with the preset search format, if preset search format and result entry
The numerical value of mutual information reaches preset second threshold and then determines that described and list obtains 0.5 point;
The preset search format is:Described and list+space+default type agricultural product variety classification.
3. agricultural product price analysis method as described in claim 1, which is characterized in that the agriculture for extracting the default type
The front and back text of product variety parallel construction is specifically included as characteristic information:
Sentence where extracting the agricultural product kind parallel construction of the default type remove the agricultural product kind parallel construction it
Outer remaining text, using word therein as the first subcharacter information;
The previous sentence and the corresponding text of latter sentence of sentence where extracting the agricultural product kind parallel construction of the default type
This, using word therein as the second subcharacter information;
Using the first subcharacter information and the second subcharacter information as the characteristic information.
4. utilizing support vector machines the method according to claim 1, wherein described according to the characteristic information
The multiple base classifiers of classification algorithm training, the multiple base classifier is combined, the assembled classifier is constituted, specific to wrap
Include following steps:
The parallel construction of the agricultural product kind for the default type training sample data concentrated by pre-set criteria and institute
The parallel construction for stating the agricultural product kind of non-default type is divided into N parts and L parts respectively;
Randomly select the parallel construction and the K parts of non-default types of the agricultural product kind of the N-1 parts therein default type
Agricultural product kind parallel construction as training sample, the knot arranged side by side of the agricultural product kind of remaining 1 part of default type
The parallel construction of the agricultural product kind of structure and the L-K parts of non-default types passes through the support vector machines as test sample
Sorting algorithm is learnt, and a base classifier is obtained;
It repeats previous step M times, then obtains M base classifier;
The M base classifier is combined, the assembled classifier is obtained.
5. method as claimed in claim 4, which is characterized in that described to obtain the agricultural product product using the assembled classifier
Kind information, specifically includes:
Using the assembled classifier judge the parallel construction in expression structure arranged side by side to be measured whether be the default type agriculture
The parallel construction of product variety;If so, the corresponding agricultural product kind of the parallel construction is added to corresponding agricultural production category
In not, to obtain the agricultural product kind information.
6. the method as described in claim 1, which is characterized in that described to crawl each agricultural product from default commodities trading website
The geographical location information of the supplier of kind and the pricing information of agricultural product, specifically include:
Using static URL crawling method to the agricultural product supplier of static Web page on the default commodities trading website
Geographical location information crawled;
Or,
Using Selenium tool to the agricultural product supplier's of dynamic web page on the default commodities trading website
Geographical location information is crawled.
7. the method as described in claim 1, which is characterized in that described according to the agricultural product kind information and described each
The producing region that the geographical location information of the supplier of agricultural product kind carries out agricultural product divides, and obtains the production of the agricultural product of each kind
Before area's information, the method also includes:
Obtain the geography information in china administration region, wherein the geography information is divided into 4 grades:The first order is to save or municipality directly under the Central Government, the
Second level is prefecture-level city, the third level is area or county-level city, the fourth stage are street or small towns.
8. the method for claim 7, which is characterized in that according to the agricultural product kind information and each agricultural product
The producing region that the geographical location information of the supplier of kind carries out agricultural product divides, and obtains the producing region letter of the agricultural product of each kind
Breath, specifically includes:
According to the level order in the geography information, the geographical location information of the agricultural product supplier is carried out at consistency
Reason;
For the agricultural product kind information, according to each place of production in each agricultural product kind and corresponding multiple places of production in network
The number occurred in advertisement calculates the agricultural product kind in the producing region weight in each place of production;
By the place of production of the producing region maximum weight and in the place of production preset range and producing region weight is more than preset third
The place of production of threshold value is as the first main producing region;
Multiple main producing regions of preset quantity are successively determined in other places of production in addition to first main producing region.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510516065.6A CN105205099B (en) | 2015-08-20 | 2015-08-20 | A kind of agricultural product price analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510516065.6A CN105205099B (en) | 2015-08-20 | 2015-08-20 | A kind of agricultural product price analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105205099A CN105205099A (en) | 2015-12-30 |
CN105205099B true CN105205099B (en) | 2018-11-20 |
Family
ID=54952783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510516065.6A Expired - Fee Related CN105205099B (en) | 2015-08-20 | 2015-08-20 | A kind of agricultural product price analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105205099B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649264B (en) * | 2016-11-21 | 2019-07-05 | 中国农业大学 | A kind of Chinese fruit variety information extraction method and device based on chapter information |
CN106777136A (en) * | 2016-12-19 | 2017-05-31 | 上海找钢网信息科技股份有限公司 | A kind of steel trade price index information map interactive exhibition system and method |
CN108648002A (en) * | 2018-04-28 | 2018-10-12 | 张青 | A kind of Estimation System and method of fruit weighting price |
CN109614538A (en) * | 2018-12-17 | 2019-04-12 | 广东工业大学 | A kind of extracting method, device and the equipment of agricultural product price data |
CN111461510A (en) * | 2020-03-19 | 2020-07-28 | 江苏省农业科学院 | Modern agricultural production and sales demonstration and promotion service system and method based on Internet of things |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101587568A (en) * | 2008-05-19 | 2009-11-25 | 北京中食新华科技有限公司 | Expert system used in agricultural-product supply-chain logistics system |
CN103577581A (en) * | 2013-11-08 | 2014-02-12 | 南京绿色科技研究院有限公司 | Method for forecasting price trend of agricultural products |
CN104732435A (en) * | 2015-04-03 | 2015-06-24 | 中国农业科学院农业信息研究所 | Agricultural product supply and demand matching system and method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7853473B2 (en) * | 2004-08-31 | 2010-12-14 | Revionics, Inc. | Market-based price optimization system |
-
2015
- 2015-08-20 CN CN201510516065.6A patent/CN105205099B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101587568A (en) * | 2008-05-19 | 2009-11-25 | 北京中食新华科技有限公司 | Expert system used in agricultural-product supply-chain logistics system |
CN103577581A (en) * | 2013-11-08 | 2014-02-12 | 南京绿色科技研究院有限公司 | Method for forecasting price trend of agricultural products |
CN104732435A (en) * | 2015-04-03 | 2015-06-24 | 中国农业科学院农业信息研究所 | Agricultural product supply and demand matching system and method |
Non-Patent Citations (2)
Title |
---|
"农业复杂自适应搜索模型研究及实现";黄河;《中国博士学位论文全文数据库 信息科技辑》;20101015;论文正文第2章、第3章 * |
"基于开源WebGIS的农产品供求与价格信息系统应用研究";唐伟;《中国优秀硕士学位论文全文数据库 基础科学辑》;20140515;论文正文第3章、第4章、第5章 * |
Also Published As
Publication number | Publication date |
---|---|
CN105205099A (en) | 2015-12-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105205099B (en) | A kind of agricultural product price analysis method | |
Khalkheili et al. | Farmer participation in irrigation management: the case of Doroodzan Dam Irrigation Network, Iran | |
CN102332025A (en) | Intelligent vertical search method and system | |
Spiers et al. | Culture and consumer behavior-a study of Trinidad & Tobago and Jamaica | |
CN106294363A (en) | A kind of forum postings evaluation methodology, Apparatus and system | |
Kolodinsky et al. | Consumer response to hemp: A case study of Vermont residents from 2019 to 2020 | |
CN106649264B (en) | A kind of Chinese fruit variety information extraction method and device based on chapter information | |
CN108733652A (en) | The test method of film review emotional orientation analysis based on machine learning | |
Wilson | Understanding branding is demanding… | |
Bahmani‐Oskooee et al. | Exchange rate volatility and US commodity trade with the rest of the world | |
Taylor et al. | Sex, beauty, and youth: An analysis of advertising appeals targeting US women of different age groups | |
Chatterjee et al. | Geographic neighbourhood and cluster formation: evidence from Indian agriculture | |
James et al. | Development of a Scale to Analyse the Perception of Krishi Vigyan Kendra Scientists Regarding Social media for Agricultural Development | |
CN108804416A (en) | The training method of film review emotional orientation analysis based on machine learning | |
Lattar et al. | Pollen analysis in some species of Linaceae-Linoideae from Argentina | |
Król | Promoting of agrotourism on the Internet–A lesson from the Visegrad Group countries | |
Vinithra et al. | Simulated and self-sustained classification of Twitter data based on its sentiment | |
Akinyemi et al. | Nestedness and modularity in fragmented Shasha Forest Reserve, southwestern Nigeria | |
CN105243094A (en) | Microblog text and personal information based user occupation classification method and system | |
CN108763203A (en) | The method for being indicated film review with feature vector using feature word set in film review sentiment analysis | |
Mohammadi et al. | Assessing the impact of competitiveness on urban network transformation using social network analysis (case: Isfahan city-region) | |
Mohtar et al. | Certification Mark for Sabah Handicrafts | |
Cohuţ et al. | Boosting Inclusive Entrepreneurship as a Strategic Option of Local Development in Romania in 2014-2020 (II) | |
Mohammadian et al. | Identifying Component Themes of the Consumer ProductsBrand Personality in Contemporary Iranian Market (A Qualitative Approach) | |
Wohlrabe et al. | Trends in economics publications represented by JEL categories between 2007 and 2013 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20181120 Termination date: 20210820 |