CN105205099B - A kind of agricultural product price analysis method - Google Patents

A kind of agricultural product price analysis method Download PDF

Info

Publication number
CN105205099B
CN105205099B CN201510516065.6A CN201510516065A CN105205099B CN 105205099 B CN105205099 B CN 105205099B CN 201510516065 A CN201510516065 A CN 201510516065A CN 105205099 B CN105205099 B CN 105205099B
Authority
CN
China
Prior art keywords
agricultural product
information
parallel construction
default
producing region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510516065.6A
Other languages
Chinese (zh)
Other versions
CN105205099A (en
Inventor
陈瑛
高万林
季烜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN201510516065.6A priority Critical patent/CN105205099B/en
Publication of CN105205099A publication Critical patent/CN105205099A/en
Application granted granted Critical
Publication of CN105205099B publication Critical patent/CN105205099B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Mining & Mineral Resources (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Animal Husbandry (AREA)
  • Agronomy & Crop Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of agricultural product price analysis methods, include the following steps:Using one assembled classifier of preset search engine training, agricultural product kind information is obtained according to the assembled classifier;The geographical location information of the supplier of each agricultural product kind and the pricing information of agricultural product are crawled from default commodities trading website;It is divided according to the producing region that the geographical location information of the supplier of the agricultural product kind information and each agricultural product kind carries out agricultural product, obtains the producing region information of the agricultural product of each kind;According to the producing region information of the agricultural product of each kind and the pricing information of agricultural product, shown using the compartmentalization that Distribution GIS technology carries out the agricultural product price of the kind.The present invention can provide enough, the higher information of accuracy for the business decision of agricultural product.

Description

A kind of agricultural product price analysis method
Technical field
The present invention relates to agricultural product price analysis technical field more particularly to a kind of agricultural product price analysis methods.
Background technique
Market for farm products market are the important components of national economy, politics and social stability, reinforce agricultural product price Information analysis, the difference condition between the situation of change and area and area of agricultural product price is obtained, for stablizing agricultural production Product market conditions, providing science to government department, agricultural product whole seller and agricultural producer etc., accurately decision information has weight Want meaning.Government department according to the variation and area differentiation situation of agricultural product price come macro adjustments and controls appropriate, preferably to advise Layout of agricultural production is drawn, the structure of agricultural production is adjusted, farm produce sale is organized, reaches the balance between supply and demand between region, avoid agricultural product The huge fluctuation of price, stabilization of maintaining market, to efficiently solve rural economy, rural development and rural demography;Agricultural product whole seller is according to agricultural product price Fluctuation adjust operation sales tactics, obtain more golden eggs;Agricultural producer is appropriate to change agricultural production kind according to supply and demand situation Kind is planted, avoids unsalable, influences to take in.
With the increasingly intensification and the rapid development of internet of socialist market economy system reform, agricultural product price It is increasingly influenced by market management situation and circulation environment, agricultural product network trading becomes increasingly prevalent, agricultural product Transaction data also sharp increase.For agricultural product, price often significantly becomes with its kind, the place of production and selling spot Change, how sufficiently to excavate these data and obtain the relationship between three have become research hotspot.
Currently, China has existed many networks quotation platforms, but its there are the following problems:
First, without the difference in kind.It, will not be specific to for example, quotation platform often can only provide the price of watermelon The price of each variety of watermelon;
Second, without the difference on region.For example, quotation platform tends not to provide the place of production of agricultural product.
These data can not provide enough information all for business decision.
Summary of the invention
The technical problem to be solved by the present invention is to solve existing quotation platform not combining the specific kind of agriculture product and the place of production The problem of offering, enough business decision information cannot be provided.
For this purpose, including the following steps the invention proposes a kind of agricultural product price analysis method:
Using one assembled classifier of preset search engine training, agricultural product kind letter is obtained according to the assembled classifier Breath;
The geographical location information and agricultural production of the supplier of each agricultural product kind are crawled from default commodities trading website The pricing information of product;
It is carried out according to the geographical location information of the supplier of the agricultural product kind information and each agricultural product kind The producing region of agricultural product divides, and obtains the producing region information of the agricultural product of each kind;
According to the producing region information of the agricultural product of each kind and the pricing information of agricultural product, geography information system is utilized The compartmentalization for the agricultural product price that system GIS technology carries out the kind is shown.
Preferably, described to utilize one assembled classifier of preset search engine training, it is obtained according to the assembled classifier Agricultural product kind information, specifically includes:
Text corresponding with search term on preset search engine is crawled, the expression structure arranged side by side in the text is extracted;Its In, the expression structure arranged side by side is the text separated with pause mark;
The parallel construction and non-default type of the agricultural product kind of default type are selected from the expression structure arranged side by side The parallel construction of agricultural product kind is as training sample data collection;
The front and back text of agricultural product kind parallel construction of the default type is extracted as characteristic information;
According to the characteristic information, using the multiple base classifiers of support vector cassification algorithm training, by the multiple base Classifier is combined, and constitutes the assembled classifier;
The agricultural product kind information is obtained using the assembled classifier.
Preferably, the expression structure arranged side by side for crawling corresponding text on preset search engine, extracting in the text, It specifically includes:
The basic kind thesaurus for constructing the agricultural product of a default type, using the word in the thesaurus as seed Word;The thesaurus includes the basic kind of the agricultural product of the default type of preset quantity;
The corresponding entry of basic kind word of the agricultural product in the thesaurus is downloaded by the preset search engine, Extract all text informations in the entry;
All expression structures arranged side by side in the text information are extracted using regular expression.
Preferably, the parallel construction of the agricultural product kind that default type is selected from the expression structure arranged side by side and non- The parallel construction of the agricultural product kind of default type is specifically included as training sample data collection:
According to default marking rule is to each of parallel construction and list is given a mark;
Each and list obatained score mean value in each parallel construction is calculated, using the mean value as the parallel construction Score;
The score of each parallel construction is compared with preset first threshold, when the score of the parallel construction reaches When the first threshold, then the parallel construction is the parallel construction of the agricultural product kind of default type, is non-default kind otherwise The parallel construction of the agricultural product kind of class;
Wherein, the default marking rule includes:
It is scanned in the preset search engine with preset search format, if described and list is in the thesaurus Occur, then determines that described and list obtains 1 point;
It is scanned in the preset search engine with preset search format, if comprising described arranged side by side in result entry , then determine that described and list obtains 0.8 point;
It is scanned in the preset search engine with the preset search format, if preset search format and result word The numerical value of the mutual information of item reaches preset second threshold and then determines that described and list obtains 0.5 point;
The preset search format is:Described and list+space+default type agricultural product variety classification.
Preferably, the front and back text of the agricultural product kind parallel construction for extracting the default type is believed as feature Breath, specifically includes:
Sentence removes the agricultural product kind and ties side by side where extracting the agricultural product kind parallel construction of the default type Remaining text except structure, using word therein as the first subcharacter information;
The previous sentence of sentence where extracting the agricultural product kind parallel construction of the default type and latter sentence are corresponding Text, using word therein as the second subcharacter information;
Using the first subcharacter information and the second subcharacter information as the characteristic information.
Preferably, described according to the characteristic information, multiple base classifiers are trained using support vector cassification algorithm, it will The multiple base classifier is combined, and is constituted the assembled classifier, is specifically included:
The parallel construction of the agricultural product kind for the default type training sample data concentrated by pre-set criteria It is divided into N parts and L parts respectively with the parallel construction of the agricultural product kind of the non-default type;
Randomly select the agricultural product kind of the N-1 parts therein default type parallel construction and K parts it is described non-default The parallel construction of the agricultural product kind of type as training sample, the agricultural product kind of remaining 1 part of default type and The parallel construction of the agricultural product kind of array structure and the L-K parts of non-default types as test sample, by it is described support to Amount machine sorting algorithm is learnt, and a base classifier is obtained;
It repeats previous step M times, then obtains M base classifier;
The M base classifier is combined, the assembled classifier is obtained.
Preferably, described to obtain the agricultural product kind information using the assembled classifier, it specifically includes:
Judge whether the parallel construction in expression structure arranged side by side to be measured is the default type using the assembled classifier Agricultural product kind parallel construction;If so, the corresponding agricultural product kind of the parallel construction is added to corresponding agricultural production Category not in, to obtain the agricultural product kind information.
Preferably, the geographical location letter of the supplier that each agricultural product kind is crawled from default commodities trading website The pricing information of breath and agricultural product, specifically includes:
The agricultural product of static Web page are supplied on the default commodities trading website using static URL crawling method The geographical location information of quotient is answered to crawl;
Or,
The agricultural product of dynamic web page are supplied on the default commodities trading website using Selenium tool The geographical location information of quotient crawls.
Preferably, in the geography of the supplier according to the agricultural product kind information and each agricultural product kind The producing region that location information carries out agricultural product divides, and before obtaining the producing region information of the agricultural product of each kind, the method is also wrapped It includes:
Obtain the geography information in china administration region, wherein the geography information is divided into 4 grades:The first order is to save or be directly under the jurisdiction of City, the second level are prefecture-level city, the third level is area or county-level city, the fourth stage are street or small towns.
Preferably, believed according to the geographical location of the supplier of the agricultural product kind information and each agricultural product kind The producing region that breath carries out agricultural product divides, and obtains the producing region information of the agricultural product of each kind, specifically includes:
According to the level order in the geography information, the geographical location information of the agricultural product supplier is carried out consistent Property processing;
For the agricultural product kind information, existed according to each place of production in each agricultural product kind and corresponding multiple places of production The number occurred in the web advertisement calculates the agricultural product kind in the producing region weight in each place of production;
By the place of production of the producing region maximum weight and in the place of production preset range and producing region weight is more than preset The place of production of third threshold value is as the first main producing region;
Multiple main producing regions of preset quantity are successively determined in other places of production in addition to first main producing region.
It, can be by specific agricultural product kind and agricultural production by using agricultural product price analysis method disclosed in this invention Price difference of the product in different zones combines, and realizes every kind of agricultural product in the flat fare in each producing region using GIS technology Lattice are shown, provide enough, the higher information of accuracy for business decision.
Detailed description of the invention
The features and advantages of the present invention will be more clearly understood by referring to the accompanying drawings, and attached drawing is schematically without that should manage Solution is carries out any restrictions to the present invention, in the accompanying drawings:
Fig. 1 shows agricultural product price analysis method flow chart provided in an embodiment of the present invention;
Fig. 2 shows utilize preset search engine training assembled classifier in the embodiment of the present invention and obtain agricultural product kind The flow chart of information;
Fig. 3, which is shown, utilizes the multiple base classifiers of support vector cassification algorithm training and composition group in the embodiment of the present invention Close the flow chart of classifier;
Fig. 4 is shown in the embodiment of the present invention according to the confession of the agricultural product kind information and each agricultural product kind The producing region for answering the geographical location information of quotient to carry out agricultural product divides, and obtains the producing region information flow chart of the agricultural product of each kind;
Fig. 5 shows red fuji apple in the embodiment of the present invention in the price situation schematic diagram in the whole nation.
Specific embodiment
Below in conjunction with attached drawing, embodiments of the present invention is described in detail.
Fig. 1 shows agricultural product price analysis method flow chart provided in an embodiment of the present invention.As shown in Figure 1, this implementation The agricultural product price analysis method of example, includes the following steps:
A1:Using one assembled classifier of preset search engine training, agricultural product product are obtained according to the assembled classifier Kind information;
A2:Geographical location information and the agriculture of the supplier of each agricultural product kind are crawled from default commodities trading website The pricing information of product;
A3:According to the geographical location information of the supplier of the agricultural product kind information and each agricultural product kind into The producing region of row agricultural product divides, and obtains the producing region information of the agricultural product of each kind;
A4:According to the producing region information of the agricultural product of each kind and the pricing information of agricultural product, believed using geography The compartmentalization for the agricultural product price that breath system GIS technology carries out the kind is shown.
Agricultural product price analysis method provided by the present invention is by specific agricultural product kind and agricultural product in different zones On difference combine, realize that average price of the every kind of agricultural product in each producing region is shown using GIS technology, be business decision Enough, the higher information of accuracy is provided.
Fig. 2 shows utilize preset search engine training assembled classifier in the embodiment of the present invention and obtain agricultural product kind The flow chart of information.As shown in Fig. 2, one assembled classifier of preset search engine training is utilized described in the present embodiment, according to The step of assembled classifier acquisition agricultural product kind information, specifically includes:
B1:Text corresponding with search term on preset search engine is crawled, the expression structure arranged side by side in the text is extracted; Wherein, the expression structure arranged side by side is the text separated with pause mark;
B2:The parallel construction and non-default type of the agricultural product kind of default type are selected from the expression structure arranged side by side Agricultural product kind parallel construction as training sample data collection;
B3:The front and back text of agricultural product kind parallel construction of the default type is extracted as characteristic information;
B4:It will be the multiple using the multiple base classifiers of support vector cassification algorithm training according to the characteristic information Base classifier is combined, and constitutes the assembled classifier;
B5:The agricultural product kind information is obtained using the assembled classifier.
Further, described to crawl corresponding text on preset search engine, extract the expression arranged side by side knot in the text Structure specifically includes:
S1:Construct a default type agricultural product basic kind thesaurus, using the word in the thesaurus as Seed words;The thesaurus includes the basic kind of agricultural product of the default type of preset quantity;
S2:The corresponding word of basic kind word of the agricultural product in the thesaurus is downloaded by the preset search engine Item extracts all text informations in the entry;
S3:All expression structures arranged side by side in the text information are extracted using regular expression.
Further, the parallel construction of the agricultural product kind that default type is selected from the expression structure arranged side by side and The parallel construction of the agricultural product kind of non-default type is specifically included as training sample data collection:
According to default marking rule is to each of parallel construction and list is given a mark;
Each and list obatained score mean value in each parallel construction is calculated, using the mean value as the parallel construction Score;
The score of each parallel construction is compared with preset first threshold, when the score of the parallel construction reaches When the threshold value, then the parallel construction is the parallel construction of the agricultural product kind of default type, is the agriculture of non-default type otherwise The parallel construction of product variety;
Wherein, the default marking rule includes:
It is scanned in the preset search engine with preset search format, if described and list is in the thesaurus Occur, then determines that described and list obtains 1 point;
It is scanned in the preset search engine with preset search format, if comprising described arranged side by side in result entry , then determine that described and list obtains 0.8 point;
It is scanned in the preset search engine with the preset search format, if preset search format and result word The numerical value of the mutual information of item reaches preset second threshold and then determines that described and list obtains 0.5 point;
The preset search format is:Described and list+space+default type agricultural product variety classification.
Preferably, the front and back text of the agricultural product kind parallel construction for extracting the default type is as characteristic information It specifically includes:
Sentence removes the agricultural product kind and ties side by side where extracting the agricultural product kind parallel construction of the default type Remaining text except structure, using word therein as the first subcharacter information;
The previous sentence of sentence where extracting the agricultural product kind parallel construction of the default type and latter sentence are corresponding Text, using word therein as the second subcharacter information;
Using the first subcharacter information and the second subcharacter information as the characteristic information.
Fig. 3, which is shown, utilizes the multiple base classifiers of support vector cassification algorithm training and composition group in the embodiment of the present invention Close the flow chart of classifier.As shown in figure 3, it is described according to the characteristic information, it is more using the training of support vector cassification algorithm The multiple base classifier is combined by a base classifier, is constituted the assembled classifier and is specifically included following steps:
S01:The agricultural product kind for the default type training sample data concentrated by pre-set criteria it is arranged side by side The parallel construction of the agricultural product kind of structure and the non-default type is divided into N parts and L parts respectively;
S02:Randomly select the agricultural product kind of the N-1 parts therein default type parallel construction and K parts it is described non- The parallel construction of the agricultural product kind of default type is as training sample, the agricultural product kind of remaining 1 part of default type Agricultural product kind with the L-K parts of non-default types is carried out as test sample by the support vector cassification algorithm Study, obtains a base classifier;
S03:It repeats step S02M times, then obtains M base classifier;
S04:The M base classifier is combined, the assembled classifier is obtained.
Further, described to be specifically included using the assembled classifier acquisition agricultural product kind information:
Judge whether the parallel construction in expression structure arranged side by side to be measured is the default type using the assembled classifier Agricultural product kind parallel construction;If so, the corresponding agricultural product kind of the parallel construction is added to corresponding agricultural production Category not in, to obtain the agricultural product kind information.
Preferably, the geographical location letter of the supplier that each agricultural product kind is crawled from default commodities trading website The pricing information of breath and agricultural product specifically includes:
The agricultural product of static Web page are supplied on the default commodities trading website using static URL crawling method The geographical location information of quotient is answered to crawl;
Or,
The agricultural product of dynamic web page are supplied on the default commodities trading website using Selenium tool The geographical location information of quotient crawls.
Wherein, Selenium tool is a tool for web application test.
As the preferred of the present embodiment, described according to the agricultural product kind information and each agricultural product kind The producing region that the geographical location information of supplier carries out agricultural product divides, before obtaining the producing region information of the agricultural product of each kind, The method also includes:
Obtain the geography information in china administration region, wherein the geography information is divided into 4 grades:The first order is to save or be directly under the jurisdiction of City, the second level are prefecture-level city, the third level is area or county-level city, the fourth stage are street or small towns.
Fig. 4 is shown in the embodiment of the present invention according to the confession of the agricultural product kind information and each agricultural product kind The producing region for answering the geographical location information of quotient to carry out agricultural product divides, and obtains the producing region information flow chart of the agricultural product of each kind;
As shown in figure 4, the geographical location of the supplier of the agricultural product kind information and each agricultural product kind is believed The producing region that breath carries out agricultural product divides, and the producing region information for obtaining the agricultural product of each kind specifically includes:
SS1:According to the level order in the geography information, the geographical location information of the agricultural product supplier is carried out Consistency treatment;
SS2:For the agricultural product kind information, according to each production in each agricultural product kind and corresponding multiple places of production The number that ground occurs in the web advertisement calculates the agricultural product kind in the producing region weight in each place of production;
SS3:By the place of production of the producing region maximum weight and in the place of production preset range and producing region weight is more than pre- If third threshold value the place of production as the first main producing region;
SS4:Multiple main producing regions of preset quantity are successively determined in other places of production in addition to first main producing region.
Below using fruit as default kind of agricultural products, agricultural product price analysis side provided by the present embodiment is described in detail The specific implementation process of method:
Step 1, it using the encyclopaedia entry of preset search engine (such as Baidupedia) downloading fruit, and crawls it and corresponds to text The expression structure arranged side by side separated in this with pause mark, selects wherein fruit variety parallel construction and non-fruit variety using voting mechanism Training sample data collection of the parallel construction as Study strategies and methods.
Steps are as follows for the acquisition methods of above-mentioned training sample data collection:
(1) the basic kind thesaurus for rule of thumb constructing a fruit first, using the word in table as follow-up work The seed words of starting include most common 15 to the 30 kinds of fruit varieties of every kind of fruit in the thesaurus, such as the seed of apple is chatted Vocabulary is as shown in table 1:
The seed thesaurus of 1 apple of table
Fuji apple Loud, high-pitched sound
Hua Niu A Suke
Red marshal Qiao Najin
Jin Guan Luochuan
State's light Wang Lin
Qin Guan Red Star
State's light Red general
(2) the corresponding Baidu's entry of these fruit varieties is downloaded by Baidupedia, and extracts all texts in entry Information.
(3) all expression structures arranged side by side in above-mentioned text information are extracted using regular expression, which refers to The word that all pause marks separate, such as in short " adult and nymph suck the juice of okra fruit, tender leaf, tender tip, after fruit is killed, Emptying by harm pulp in scraggly malformed fruit, suberification " is then matched using regular expression and is extracted It as a result is " juice that adult and nymph suck okra fruit, tender leaf, tender tip ".
(4) according to default marking rule is to each of parallel construction and list is given a mark;Calculate each parallel construction In each and list obatained score mean value, using the mean value as the score of the parallel construction;By each parallel construction Score is compared with preset first threshold, when the score of the parallel construction reaches the first threshold, then it is described simultaneously Array structure is the parallel construction of the agricultural product kind of default type, is the knot arranged side by side of the agricultural product kind of non-default type otherwise Structure;
Wherein, the default marking rule includes:
The default marking rule is as follows:
R1:It is scanned in the preset search engine with preset search format, if described and list is in the descriptor Occur in table, then determines that described and list obtains 1 point;
R2:It is scanned in the preset search engine with preset search format, if in result entry comprising described in simultaneously List then determines that described and list obtains 0.8 point;Wherein, the preset search format is:Described and list+space+default type Agricultural product variety classification;
In the present embodiment, it is searched in Baidupedia with " and list "+space+" fruit classification " search format Rope, such as:For simultaneously list " Fuji apple ", i.e., " red fuji apple " is scanned in Baidupedia entry, if gained knot Fruit is " red fuji apple ", due to containing simultaneously list " Fuji apple " in the result, then simultaneously list " Fuji apple " can should obtain 0.8 Point.
R3:It is scanned in the preset search engine with the preset search format, if preset search format and knot The numerical value of the mutual information of fruit entry reaches preset second threshold and then determines that described and list obtains 0.5 point.
Step 2, characteristic information of the front and back text of fruit variety parallel construction as training base classifier is extracted, it is specific to wrap It includes:
Sentence where extracting fruit variety parallel construction removes the remaining text except the fruit variety parallel construction, with it In word as the first subcharacter information;The previous sentence and latter sentence pair of sentence where extracting fruit variety parallel construction The text answered, using word therein as the second subcharacter information;By the first subcharacter information and second subcharacter Information is as the characteristic information.
Text corresponding to features described above information is segmented, goes the pretreatments behaviour such as stop words and feature vector expression Make.
Step 3, a base classifier is constructed using support vector cassification algorithm, then repeats the experiment repeatedly to generate These base classifiers are combined by multiple base classifiers, constitute assembled classifier, the output of the assembled classifier is these The output of base classifier vote as a result, specific steps include:
S01:The parallel construction for the fruit variety training sample data concentrated by pre-set criteria by pre-set criteria with Stochastic averagina is divided into N parts and L parts (pre-set criteria is, for example, the parallel construction of non-fruit variety respectively:So that every fruits product Type parallel construction and non-aqueous fruit type parallel construction generally remain 1 to 1 ratio);
S02:The parallel construction of the parallel construction and K parts of non-fruit varieties of randomly selecting N-1 fruits kind therein is made For training sample, the parallel construction of remaining 1 fruits kind and the parallel construction of L-K parts of non-fruit varieties are as test specimens This, is learnt by support vector cassification algorithm, can generate a base classifier;
S03:It repeats step S02M times, then may make up M base classifier altogether.
S04:The M base classifier is combined, assembled classifier is obtained.
For example, being directed to the parallel construction sample of 520 fruit variety classes and the parallel construction of 5500 non-aqueous fruit types Fruit variety class parallel construction is randomly divided into 10 parts every time by sample, and non-aqueous fruit type parallel construction is randomly divided into 100 Part, it takes out 9 parts at random from fruit variety class parallel construction, takes out 9 parts in non-aqueous fruit type parallel construction, composing training sample This, i.e., so that the total amount of the fruit variety class parallel construction and non-aqueous fruit type parallel construction taken out generally remains 1 to 1 ratio Example;By remaining 1 fruits kind class parallel construction and 91 parts of non-aqueous fruit type parallel constructions, test sample is constituted.
It is primary according to the training of support vector cassification algorithm, when a base classifier can be obtained.Construct assembled classifier When, according to the above method training repeatedly, can be obtained by different multiple base classifiers combinations at assembled classifier.
Be given below the present embodiment by 51 base classifiers be combined composition assembled classifier as a result, the present embodiment into Many experiments are gone, so that the output of assembled classifier is stablized, and using above-mentioned marking rule to resulting assembled classifier Export result Average Accuracy carry out analytical calculation, specific calculating process herein without repeating, gained assembled classifier Average Accuracy is as shown in table 2.
The Average Accuracy of 2 assembled classifier of table
Classifier number For the first time Second For the third time 4th time 5th time Average Accuracy
1 68.90% 65.30% 69.70% 62.30% 69.20% 66.72%
5 69.40% 73.50% 71.70% 72.80% 70.20% 71.52%
11 74.80% 76.00% 73.20% 76.10% 73.90% 74.80%
31 77.80% 76.50% 78.60% 77.40% 77.30% 77.52%
51 78.60% 78.40% 78.40% 78.70% 77.20% 78.28%
101 78.10% 78.20% 79.70% 78.80% 78.90% 78.74%
Step 4, the ground of agricultural product supplier is crawled from default commodities trading website (such as the wholesale net of Alibaba) Manage location information (for example, fruit source area, fruit retailer address) and agricultural product pricing information.Due to Arriba Bar batch hairnet uses the technology of dynamic web page, directly directly can not crawl geographical location information by static state URL crawling method, Therefore dynamic web page is handled using Selenium tool, pseudocode provided in this embodiment is as follows:
Step 5, the geography information of china administration region (cities and counties of province) is obtained using Internet resources.Specifically, from network The geography information for crawling all cities and counties of province of China, is stored using JSON data format, which is divided into 4 grades, the Level-one is province or municipality directly under the Central Government, the second level are prefecture-level city, the third level is area or county-level city, the fourth stage are street or small towns.
Step 6, believed according to the geographical location of the supplier of the agricultural product kind information and each agricultural product kind The producing region that breath carries out agricultural product divides, and obtains the producing region information of the agricultural product of each kind, specifically includes:
(1) according to level order (province or municipality directly under the Central Government, prefecture-level city, area or the county-level city, street or township in above-mentioned geography information Town), consistency treatment carried out to the geographical location information of the agricultural product supplier, such as by the geographical position of agricultural product supplier It sets and is completely expressed as in " XX province, the city XX, the county XX ", that is, the geographical location information statement that can avoid supplier is lack of standardization, such as The problems such as province, city, county's information are write a Chinese character in simplified form or lacked to name.
(2) for each of agricultural product kind information agricultural product kind, according to each agricultural product kind and corresponding The number that each place of production occurs in the web advertisement in multiple places of production, calculates the agricultural product kind in the producing region weight in each place of production, Calculation method is as follows:
Agricultural product kind and the two information of the place of production often occur once, then the producing region weight being enabled to add in the first web advertisement 1。
(3) its main producing region is found for each agricultural product kind in the agricultural product kind information, steps are as follows:
By the place of production of the producing region maximum weight and (such as 250 kilometers) and producing region in the place of production preset range Weight is more than the place of production of preset third threshold value (such as 2) as the first main producing region;
(4) present count is successively determined according to producing region weight and distance in other places of production in addition to first main producing region Multiple main producing regions (such as second main producing region, third main producing region etc.) of amount, the master of every kind of agricultural product provided in this embodiment Producing region is as shown in table 3.
The main producing region of the every kind of agricultural product of table 3
Step 7, agricultural product average price is carried out in conjunction with the agricultural product kind information and agricultural product producing region information Analysis, each kind price situation of each Regional Agricultural Products of gained are as shown in table 4.
Each kind price situation of each Regional Agricultural Products of table 4
By in such as table 4 it is found that the price of different zones agricultural product of the same race also can be different.For example, in apple class, Fuji apple Price in the first main producing region will be higher than the price in the second main producing region, and the price of loud, high-pitched sound also the first main producing region is higher than The price in region.
Fig. 5 shows red fuji apple in the embodiment of the present invention in the price situation schematic diagram in the whole nation.As shown in figure 5, this Compartmentalization is carried out using agricultural product price of the Distribution GIS technology to each kind in embodiment to show, from figure The shade in area open-and-shut can know red fuji apple in the price distribution situation in the whole nation, the i.e. deeper ground of color The price in area, red fuji apple is higher.
Agricultural product price analysis method provided by the present embodiment, by specific agricultural product kind and agricultural product in not same district Price difference on domain combines, and realizes that average price of the every kind of agricultural product in each producing region is shown using GIS technology, is quotient Industry decision provides enough, the higher information of accuracy.
The undocumented technology contents of the present embodiment belong to ordinary skill common sense, and above embodiments are only to illustrate Technical solution of the present invention, rather than its limitations;Although the present invention is described in detail referring to the foregoing embodiments, ability The those of ordinary skill in domain should understand that:It is still possible to modify the technical solutions described in the foregoing embodiments, or Person's equivalent replacement of some of the technical features;And these are modified or replaceed, and do not make the essence of corresponding technical solution It departs from the spirit and scope of the technical scheme of various embodiments of the present invention.

Claims (8)

1. a kind of agricultural product price analysis method, which is characterized in that include the following steps:
Using one assembled classifier of preset search engine training, agricultural product kind information is obtained according to the assembled classifier;
The geographical location information and agricultural product of the supplier of each agricultural product kind are crawled from default commodities trading website Pricing information;
Agricultural production is carried out according to the geographical location information of the supplier of the agricultural product kind information and each agricultural product kind The producing region of product divides, and obtains the producing region information of the agricultural product of each kind;
According to the producing region information of the agricultural product of each kind and the pricing information of agricultural product, GIS-Geographic Information System is utilized The compartmentalization that GIS technology carries out the agricultural product price of the kind is shown;
It is wherein described to utilize one assembled classifier of preset search engine training, agricultural product product are obtained according to the assembled classifier Kind information, specifically includes:
Text corresponding with search term on preset search engine is crawled, the expression structure arranged side by side in the text is extracted;Wherein, institute Stating expression structure arranged side by side is the text separated with pause mark;
The parallel construction of the agricultural product kind of default type and the agricultural production of non-default type are selected from the expression structure arranged side by side The parallel construction of product kind is as training sample data collection;
The front and back text of agricultural product kind parallel construction of the default type is extracted as characteristic information;
The multiple base is classified using the multiple base classifiers of support vector cassification algorithm training according to the characteristic information Device is combined, and constitutes the assembled classifier;
The agricultural product kind information is obtained using the assembled classifier;
It is wherein described to crawl corresponding text on preset search engine, the expression structure arranged side by side in the text is extracted, it is specific to wrap It includes:
The basic kind thesaurus for constructing the agricultural product of a default type, using the word in the thesaurus as seed words; The thesaurus includes the basic kind of the agricultural product of the default type of preset quantity;
The corresponding entry of basic kind word of the agricultural product in the thesaurus is downloaded by the preset search engine, is extracted All text informations in the entry;
All expression structures arranged side by side in the text information are extracted using regular expression.
2. agricultural product price analysis method as described in claim 1, which is characterized in that described from the expression structure arranged side by side The parallel construction of the parallel construction of the agricultural product kind of default type and the agricultural product kind of non-default type is selected as training Sample data set specifically includes:
According to default marking rule is to each of parallel construction and list is given a mark;
Each and list obatained score mean value in each parallel construction is calculated, using the mean value as point of the parallel construction Number;
The score of each parallel construction is compared with preset first threshold, when the score of the parallel construction reaches described When first threshold, then the parallel construction is the parallel construction of the agricultural product kind of default type, is non-default type otherwise The parallel construction of agricultural product kind;
Wherein, the default marking rule includes:
It is scanned in the preset search engine with preset search format, if described and list goes out in the thesaurus It is existing, then determine that described and list obtains 1 point;
It is scanned in the preset search engine with preset search format, if in result entry including described and list, Determine that described and list obtains 0.8 point;
It is scanned in the preset search engine with the preset search format, if preset search format and result entry The numerical value of mutual information reaches preset second threshold and then determines that described and list obtains 0.5 point;
The preset search format is:Described and list+space+default type agricultural product variety classification.
3. agricultural product price analysis method as described in claim 1, which is characterized in that the agriculture for extracting the default type The front and back text of product variety parallel construction is specifically included as characteristic information:
Sentence where extracting the agricultural product kind parallel construction of the default type remove the agricultural product kind parallel construction it Outer remaining text, using word therein as the first subcharacter information;
The previous sentence and the corresponding text of latter sentence of sentence where extracting the agricultural product kind parallel construction of the default type This, using word therein as the second subcharacter information;
Using the first subcharacter information and the second subcharacter information as the characteristic information.
4. utilizing support vector machines the method according to claim 1, wherein described according to the characteristic information The multiple base classifiers of classification algorithm training, the multiple base classifier is combined, the assembled classifier is constituted, specific to wrap Include following steps:
The parallel construction of the agricultural product kind for the default type training sample data concentrated by pre-set criteria and institute The parallel construction for stating the agricultural product kind of non-default type is divided into N parts and L parts respectively;
Randomly select the parallel construction and the K parts of non-default types of the agricultural product kind of the N-1 parts therein default type Agricultural product kind parallel construction as training sample, the knot arranged side by side of the agricultural product kind of remaining 1 part of default type The parallel construction of the agricultural product kind of structure and the L-K parts of non-default types passes through the support vector machines as test sample Sorting algorithm is learnt, and a base classifier is obtained;
It repeats previous step M times, then obtains M base classifier;
The M base classifier is combined, the assembled classifier is obtained.
5. method as claimed in claim 4, which is characterized in that described to obtain the agricultural product product using the assembled classifier Kind information, specifically includes:
Using the assembled classifier judge the parallel construction in expression structure arranged side by side to be measured whether be the default type agriculture The parallel construction of product variety;If so, the corresponding agricultural product kind of the parallel construction is added to corresponding agricultural production category In not, to obtain the agricultural product kind information.
6. the method as described in claim 1, which is characterized in that described to crawl each agricultural product from default commodities trading website The geographical location information of the supplier of kind and the pricing information of agricultural product, specifically include:
Using static URL crawling method to the agricultural product supplier of static Web page on the default commodities trading website Geographical location information crawled;
Or,
Using Selenium tool to the agricultural product supplier's of dynamic web page on the default commodities trading website Geographical location information is crawled.
7. the method as described in claim 1, which is characterized in that described according to the agricultural product kind information and described each The producing region that the geographical location information of the supplier of agricultural product kind carries out agricultural product divides, and obtains the production of the agricultural product of each kind Before area's information, the method also includes:
Obtain the geography information in china administration region, wherein the geography information is divided into 4 grades:The first order is to save or municipality directly under the Central Government, the Second level is prefecture-level city, the third level is area or county-level city, the fourth stage are street or small towns.
8. the method for claim 7, which is characterized in that according to the agricultural product kind information and each agricultural product The producing region that the geographical location information of the supplier of kind carries out agricultural product divides, and obtains the producing region letter of the agricultural product of each kind Breath, specifically includes:
According to the level order in the geography information, the geographical location information of the agricultural product supplier is carried out at consistency Reason;
For the agricultural product kind information, according to each place of production in each agricultural product kind and corresponding multiple places of production in network The number occurred in advertisement calculates the agricultural product kind in the producing region weight in each place of production;
By the place of production of the producing region maximum weight and in the place of production preset range and producing region weight is more than preset third The place of production of threshold value is as the first main producing region;
Multiple main producing regions of preset quantity are successively determined in other places of production in addition to first main producing region.
CN201510516065.6A 2015-08-20 2015-08-20 A kind of agricultural product price analysis method Expired - Fee Related CN105205099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510516065.6A CN105205099B (en) 2015-08-20 2015-08-20 A kind of agricultural product price analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510516065.6A CN105205099B (en) 2015-08-20 2015-08-20 A kind of agricultural product price analysis method

Publications (2)

Publication Number Publication Date
CN105205099A CN105205099A (en) 2015-12-30
CN105205099B true CN105205099B (en) 2018-11-20

Family

ID=54952783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510516065.6A Expired - Fee Related CN105205099B (en) 2015-08-20 2015-08-20 A kind of agricultural product price analysis method

Country Status (1)

Country Link
CN (1) CN105205099B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649264B (en) * 2016-11-21 2019-07-05 中国农业大学 A kind of Chinese fruit variety information extraction method and device based on chapter information
CN106777136A (en) * 2016-12-19 2017-05-31 上海找钢网信息科技股份有限公司 A kind of steel trade price index information map interactive exhibition system and method
CN108648002A (en) * 2018-04-28 2018-10-12 张青 A kind of Estimation System and method of fruit weighting price
CN109614538A (en) * 2018-12-17 2019-04-12 广东工业大学 A kind of extracting method, device and the equipment of agricultural product price data
CN111461510A (en) * 2020-03-19 2020-07-28 江苏省农业科学院 Modern agricultural production and sales demonstration and promotion service system and method based on Internet of things

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587568A (en) * 2008-05-19 2009-11-25 北京中食新华科技有限公司 Expert system used in agricultural-product supply-chain logistics system
CN103577581A (en) * 2013-11-08 2014-02-12 南京绿色科技研究院有限公司 Method for forecasting price trend of agricultural products
CN104732435A (en) * 2015-04-03 2015-06-24 中国农业科学院农业信息研究所 Agricultural product supply and demand matching system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7853473B2 (en) * 2004-08-31 2010-12-14 Revionics, Inc. Market-based price optimization system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587568A (en) * 2008-05-19 2009-11-25 北京中食新华科技有限公司 Expert system used in agricultural-product supply-chain logistics system
CN103577581A (en) * 2013-11-08 2014-02-12 南京绿色科技研究院有限公司 Method for forecasting price trend of agricultural products
CN104732435A (en) * 2015-04-03 2015-06-24 中国农业科学院农业信息研究所 Agricultural product supply and demand matching system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"农业复杂自适应搜索模型研究及实现";黄河;《中国博士学位论文全文数据库 信息科技辑》;20101015;论文正文第2章、第3章 *
"基于开源WebGIS的农产品供求与价格信息系统应用研究";唐伟;《中国优秀硕士学位论文全文数据库 基础科学辑》;20140515;论文正文第3章、第4章、第5章 *

Also Published As

Publication number Publication date
CN105205099A (en) 2015-12-30

Similar Documents

Publication Publication Date Title
CN105205099B (en) A kind of agricultural product price analysis method
Khalkheili et al. Farmer participation in irrigation management: the case of Doroodzan Dam Irrigation Network, Iran
CN102332025A (en) Intelligent vertical search method and system
Spiers et al. Culture and consumer behavior-a study of Trinidad & Tobago and Jamaica
CN106294363A (en) A kind of forum postings evaluation methodology, Apparatus and system
Kolodinsky et al. Consumer response to hemp: A case study of Vermont residents from 2019 to 2020
CN106649264B (en) A kind of Chinese fruit variety information extraction method and device based on chapter information
CN108733652A (en) The test method of film review emotional orientation analysis based on machine learning
Wilson Understanding branding is demanding…
Bahmani‐Oskooee et al. Exchange rate volatility and US commodity trade with the rest of the world
Taylor et al. Sex, beauty, and youth: An analysis of advertising appeals targeting US women of different age groups
Chatterjee et al. Geographic neighbourhood and cluster formation: evidence from Indian agriculture
James et al. Development of a Scale to Analyse the Perception of Krishi Vigyan Kendra Scientists Regarding Social media for Agricultural Development
CN108804416A (en) The training method of film review emotional orientation analysis based on machine learning
Lattar et al. Pollen analysis in some species of Linaceae-Linoideae from Argentina
Król Promoting of agrotourism on the Internet–A lesson from the Visegrad Group countries
Vinithra et al. Simulated and self-sustained classification of Twitter data based on its sentiment
Akinyemi et al. Nestedness and modularity in fragmented Shasha Forest Reserve, southwestern Nigeria
CN105243094A (en) Microblog text and personal information based user occupation classification method and system
CN108763203A (en) The method for being indicated film review with feature vector using feature word set in film review sentiment analysis
Mohammadi et al. Assessing the impact of competitiveness on urban network transformation using social network analysis (case: Isfahan city-region)
Mohtar et al. Certification Mark for Sabah Handicrafts
Cohuţ et al. Boosting Inclusive Entrepreneurship as a Strategic Option of Local Development in Romania in 2014-2020 (II)
Mohammadian et al. Identifying Component Themes of the Consumer ProductsBrand Personality in Contemporary Iranian Market (A Qualitative Approach)
Wohlrabe et al. Trends in economics publications represented by JEL categories between 2007 and 2013

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20181120

Termination date: 20210820