CN110275935A - Processing method, device and storage medium, the electronic device of policy information - Google Patents
Processing method, device and storage medium, the electronic device of policy information Download PDFInfo
- Publication number
- CN110275935A CN110275935A CN201910390294.6A CN201910390294A CN110275935A CN 110275935 A CN110275935 A CN 110275935A CN 201910390294 A CN201910390294 A CN 201910390294A CN 110275935 A CN110275935 A CN 110275935A
- Authority
- CN
- China
- Prior art keywords
- policy information
- policy
- model
- subject classification
- classification label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of processing method of policy information, device and storage mediums, electronic device, wherein this method comprises: obtaining the policy information crawled in multiple data sources;Policy information is pre-processed, destination document is obtained, wherein includes the text information in policy information in destination document;Extract the keyword in destination document;The keyword extracted is inputted into the first model, it obtains and the matched subject classification label of policy information, wherein, first model is to advance with multiple training samples to the deep learning model being trained, and each training sample is at least one the subject classification label for including training objective for multiple keywords of the input data as deep learning model and for the output data as deep learning model;Policy information is associated with matched subject classification label and is stored to searching database.Through the invention, solve the problems, such as that policy information distribution in the prior art disperses, retrieves difficulty.
Description
Technical field
The present invention relates to field of data retrieval, in particular to a kind of processing method of policy information, device and deposit
Storage media, electronic device.
Background technique
The policy information of government is mainly distributed on the website of different governments at present, and network is to issue, check, obtaining government's letter
The main approach and means of breath.But it since the type of various policies is different, issuing time is different, administrative department is different, leads
Policy information is caused to disperse very much, enterprises and individuals if necessary need to spend it should be understood that the policy of demand is extremely difficult
A large amount of time and efforts is found on the website of each government, can not quickly find the information of needs.
For the above problem present in the relevant technologies, at present it is not yet found that the solution of effect.
Summary of the invention
The embodiment of the invention provides a kind of processing method of policy information, device and storage medium, electronic device, with
At least solve the problems, such as that policy information distribution in the prior art disperses, retrieves difficulty.
According to one embodiment of present invention, a kind of processing method of policy information is provided, comprising: obtain in multiple numbers
According to the policy information crawled in source;Policy information is pre-processed, destination document is obtained, wherein is wrapped in destination document
Include the text information in policy information;Extract the keyword in destination document;The keyword extracted is inputted into the first model, is obtained
To with the matched subject classification label of policy information, wherein the first model is to advance with multiple training samples to being trained
Obtained deep learning model, each training sample is to including multiple keys for the input data as deep learning model
At least one subject classification label of word and the training objective for the output data as deep learning model;By policy
Information is associated with matched subject classification label and stores to searching database.
Further, the policy information crawled in multiple data sources is obtained, comprising: download from Cloud Server pre-
The target application container first configured;Operation is crawled for multiple data sources in performance objective application container;Extraction crawls
Network address in policy information.
Further, the keyword in destination document is extracted, comprising: based on the reverse document-frequency model extraction mesh of word frequency-
Mark the keyword in document;Using preset term vector corresponding relationship, the word insertion vector of each keyword is determined.
Further, the keyword extracted is being inputted into the first model, obtained and the matched subject classification of policy information
Before label, this method further include: obtain for the full Connection Neural Network disaggregated model as initial model;Obtain multiple instructions
Practice sample pair;Using multiple training samples to the full Connection Neural Network disaggregated model of training, the first model is obtained.
Further, multiple training samples pair are obtained, comprising: obtain multiple policy documents;Each policy document is carried out
Pretreatment obtains the vocabulary bag corresponding to each policy document, wherein includes in corresponding policy document in each vocabulary bag
Vocabulary;Multiple vocabulary bags input document subject matter is generated into model, obtains multiple themes and the corresponding multiple themes of each theme
Tag along sort, wherein each training sample is to including for vocabulary bag as input and for the training objective as output
, corresponding with vocabulary bag multiple subject classification labels.
Further, policy information is being associated with matched subject classification label and is being stored to searching database,
This method further include: obtain subject classification label to be checked;Subject classification label to be checked is determined in searching database
Corresponding multiple policy informations;The content of specified attribute is extracted in each policy information, wherein specified attribute is wait compare
Attribute;To preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison.
Further, policy information is associated with matched subject classification label and is stored to searching database, comprising:
The project verification time is extracted in policy information;Policy information is inserted into the corresponding policy information chain of subject classification label based on the project verification time
In table, wherein for the policy information of the corresponding subject classification label of the sequential storage for time of setting up the project in policy information chained list;
The corresponding multiple policy informations of subject classification label to be checked are determined in searching database, comprising: search type of theme mark
Sign the gauge outfit address of corresponding chained list;To preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison,
It include: to show default map template;It is originated from the gauge outfit address of chained list, repeats following steps until on default map template
It indicates each policy information in chained list: the policy information being currently polled in chained list is obtained, in corresponding policy information
Project verification city is extracted, and is indicated in the corresponding position in default map template neutrality Xiangcheng City with default.
According to another embodiment of the invention, a kind of processing unit of policy information is provided, comprising: the device packet
Include: first obtains module, for obtaining the policy information crawled in multiple data sources;Preprocessing module, for political affairs
Plan information is pre-processed, and destination document is obtained, wherein includes the text information in policy information in destination document;Extract mould
Block, for extracting the keyword in destination document;Input module is obtained for the keyword extracted to be inputted the first model
With the matched subject classification label of policy information, wherein the first model is to advance with multiple training samples to being trained
The deep learning model arrived, each training sample is to including multiple keys for the input data as deep learning model
At least one subject classification label of word and the training objective for the output data as deep learning model;Store mould
Block, for being associated with and storing to searching database with matched subject classification label by policy information.
Further, the first acquisition module includes: download unit, for downloading preconfigured target from Cloud Server
Application container;Execution unit, for crawling operation for multiple data sources in performance objective application container;Extraction unit,
For extracting the policy information in the network address crawled.
Further, extraction module includes: extracting unit, for based on the reverse document-frequency model extraction target of word frequency-
Keyword in document;Determination unit, for utilize preset term vector corresponding relationship, determine the word of each keyword be embedded in
Amount.
Further, device further include: second obtains module, for the keyword extracted to be inputted the first mould
Type obtains obtaining with before the matched subject classification label of policy information for the full Connection Neural Network as initial model
Disaggregated model;Third obtains module, for obtaining multiple training samples pair;Training module, for utilizing multiple training samples pair
The full Connection Neural Network disaggregated model of training, obtains the first model.
Further, it includes: first acquisition unit that third, which obtains module, for obtaining multiple policy documents;Pretreatment is single
Member obtains the vocabulary bag corresponding to each policy document, wherein each vocabulary for pre-processing to each policy document
It include the vocabulary in corresponding policy document in bag;Input unit, for multiple vocabulary bags input document subject matter to be generated model,
Obtain multiple themes and the corresponding multiple subject classification labels of each theme, wherein each training sample is to including for making
For the vocabulary bag of input and for the multiple subject classification labels training objective as output, corresponding with vocabulary bag.
Further, device further include: the 4th obtains module, for by policy information and matched subject classification mark
Label are associated with and store to searching database, obtain subject classification label to be checked;Determining module, in retrieval data
The corresponding multiple policy informations of subject classification label to be checked are determined in library;5th obtains module, for believing in each policy
The content of specified attribute is extracted in breath, wherein specified attribute is attribute to be compared;Display module, for preset display side
Formula, the content of the specified attribute of the multiple policy informations of display of comparison.
Further, memory module includes: second acquisition unit, for extracting the project verification time in policy information;Insertion
Unit, for policy information to be inserted into the corresponding policy information chained list of subject classification label based on the project verification time, wherein policy
For with the policy information of the corresponding subject classification label of the sequential storage for time of setting up the project in information chained list;Determining module includes:
Searching unit, for searching the gauge outfit address of the corresponding chained list of type of theme label;Display module includes: display unit, is used for
Show default map template;Execution unit repeats following steps until default ground for originating from the gauge outfit address of chained list
Each policy information in chained list is indicated on artwork plate: the policy information being currently polled in chained list is obtained, in corresponding political affairs
Project verification city is extracted in plan information, and is indicated in the corresponding position in default map template neutrality Xiangcheng City with default.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium
Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described
Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described
Step in embodiment of the method.
Through the invention, policy information is crawled by crawling mode, obtains text information, and then by extracting keyword,
The corresponding subject classification label of policy information is obtained with preset training pattern, solves political affairs in the prior art in the related technology
The difficult technical problem of plan information distribution dispersion, retrieval, by integrating the policy information crawled, and utilizes master trained in advance
Topic tag along sort classifies to policy information, has reached convenient for retrieving the technical effect of all policy informations with type of theme.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the processing method of policy information according to an embodiment of the present invention;
Fig. 2 is the schematic diagram of the processing unit of policy information according to an embodiment of the present invention;
Fig. 3 is a kind of hardware block diagram of mobile terminal of the embodiment of the present invention.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only
The embodiment of the application a part, instead of all the embodiments, in the absence of conflict, embodiment and reality in the application
The feature applied in example can be combined with each other.Based on the embodiment in the application, those of ordinary skill in the art are not making wound
Every other embodiment obtained under the premise of the property made labour, shall fall within the protection scope of the present application.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
The processing method for present embodiments providing a kind of policy information, can be applied to client-side, wherein client can
Among mobile terminal, handheld terminal or similar arithmetic facility in operation.Operating in different arithmetic facilities only is that scheme exists
Difference in executing subject, those skilled in the art are contemplated that in nonidentity operation equipment, operation can generate identical technical effect.
The processing method of policy information provided in this embodiment,
As shown in Figure 1, the processing method of policy information provided in this embodiment includes the following steps:
The embodiment of the invention provides a kind of processing methods of policy information, include the following steps:
Step 101, the policy information crawled in multiple data sources is obtained;
Step 102, policy information is pre-processed, obtains destination document, wherein include policy information in destination document
In text information;
Step 103, the keyword in destination document is extracted;
Step 104, the keyword extracted is inputted into the first model, obtained and the matched subject classification mark of policy information
Label, wherein the first model is to advance with multiple training samples to the deep learning model being trained, each trained sample
This is to including for multiple keywords of the input data as deep learning model and for as deep learning model
At least one subject classification label of the training objective of output data;
Step 105, policy information is associated with matched subject classification label and is stored to searching database.
Policy information is crawled in a network using crawler technology, for example, crawling in a network every predetermined period
Specified list of websites (such as specified bid publicity website and search engine), obtains policy information.
In order to determine the subject classification label of policy information, the text information in the network address crawled, text information are extracted
It may include the information such as title and the Content of policy of policy, policy information pre-processed, destination document is obtained.Pretreatment can
To include being segmented to text information (as the stammerer imported using the dictionary with the specific vocabulary for policy information is segmented
Method), removal stop words, removal punctuation mark etc., obtained destination document includes the word combination of the policy information.
After obtaining destination document, keyword is extracted in destination document, extracting keyword can use word frequency-inversely
Document-frequency (TF-IDF) or textRank etc. extract key word algorithm, extract the keyword in destination document.
It after extracting keyword, inputs before the first model, it is also necessary to be the input of the first model by keyword processing
Format, for example, determine that the word of each keyword is embedded in vector using word2vec model (preset term vector corresponding relationship), it will
All word insertion vectors input the first model.
First model is trained for that matched subject classification label can be exported according to the keyword of input in advance, first
Model receive word insertion vector calculated after, can export and at least one matched subject classification mark of policy information
Label, in turn, policy information are associated with matched subject classification label and is stored into searching database, so that later retrieval makes
With.
Searching database is for storing multiple policy informations, for example, the title of policy information, issuing time, author, just
The information such as text, classification (level relation), each information can be according to title, issuing time, author, text, classification relationship (grade
Not relationship), the different attribute such as theme stored, also, passes through all properties of subject classification label and policy information
Content indexing gets up, the searching motif tag along sort in searching database, and the related policy information of institute can be obtained.
Optionally, the step of policy information that acquisition crawls in multiple data sources, may include:
Step 21, preconfigured target application container is downloaded from Cloud Server;
Step 22, operation is crawled for multiple data sources in performance objective application container;
Step 23, the policy information in the network address crawled is extracted.
Target application container can be Docker container, and Docker container can be made specified by being pre-configured with code
Code, which is executed, crawls operation for specified data source.Before downloading default application container in Cloud Server, will be directed to
The operation that crawls of multiple data sources is packed into target application container, and by target application container storage into Cloud Server.
It since target application container storage is in Cloud Server, can download when in use, and needle in target application container
Crawling for each data source is operated and can be independently executed, it is thus possible to pass through computer cluster using this feature
The information of project is crawled in a network.
Specifically, target application container is downloaded from Cloud Server by each computer in computer cluster, for meter
Each computer in calculation machine cluster distributes specified data source, the performance objective application container of each computer independently
In for corresponding specified data source crawl operation.
After the execution of each computer crawls operation, the network address that conformity calculation machine cluster crawls, and extract all nets
The information of specified data type in location, realizes to be deployed on multimachine device and crawls operation.
Optionally, the training process for obtaining the first model can use following steps:
Step 31, it obtains for the full Connection Neural Network disaggregated model as initial model;
Step 32, multiple training samples pair are obtained;
Step 33, the first model is obtained to training initial model using multiple training samples.
In this optional embodiment, the first model is the full Connection Neural Network based on deep neural network (DNN)
Disaggregated model obtains be used for as the full Connection Neural Network disaggregated model of initial model and multiple training samples first, thus
Using multiple training samples to training initial model, wherein the input of model is the term vector of keyword, and output is multiple (m
It is a) probability of subject classification label and each subject classification label, by n (n≤m) subject classification labels of probability value maximum
Subject classification label as output.
Optionally, the step of the multiple training samples pair of above-mentioned acquisition, can obtain as follows:
Step 41, multiple policy documents are obtained;
Step 42, each policy document is pre-processed, obtains the vocabulary bag corresponding to each policy document, wherein
It include the vocabulary in corresponding policy document in each vocabulary bag;
Step 43, multiple vocabulary bags input document subject matter is generated into model, obtains multiple themes and each theme is corresponding
Multiple subject classification labels.
Multiple policy documents can be through any way acquisition, and the embodiment of the present invention is not specifically limited in this embodiment.?
After obtaining multiple policy documents, each policy document is pre-processed, pretreatment includes at least participle operation, such as stammers
Participle, obtains multiple vocabulary of each policy document, forms the corresponding vocabulary bag of each policy document, each word in vocabulary bag
It converges no ordinal relation, only the set of vocabulary, vocabulary bag input document subject matter is generated in model (such as LDA model), is obtained
Output be the corresponding multiple themes of each vocabulary bag and the corresponding associative key of each theme and each keyword with
The association probability of corresponding theme, extracts the biggish Partial key word of association probability as subject classification label, by vocabulary bag with it is right
For the multiple keywords answered as a training sample pair, input of the vocabulary bag as full Connection Neural Network model is corresponding more
Output of a keyword as full Connection Neural Network model.
The process for obtaining a training sample pair using above-mentioned steps is exemplified below, a policy document is obtained,
The processing such as stammerer participle, removal stop words, removal punctuation mark are executed, the vocabulary bag of the policy document is obtained, vocabulary bag is defeated
Enter in LDA model, obtained output includes theme " agricultural ", " economy ", each theme be corresponding with multiple keywords and its with master
The association probability of topic, for example, highest two keywords of the corresponding probability of theme " agricultural " be " crops ", " farming machine ",
The corresponding highest keyword of probability of theme " economy " is " consumption ", and keyword " crops ", " farming machine ", " consumption " are made
For the corresponding subject classification label of vocabulary bag of the policy document, a training sample pair is obtained.
Optionally, policy information is being associated with matched subject classification label and is being stored to searching database, also
The multiple policy informations of display that can be compared based on the subject classification label of user query, are shown from multiple attributes, for example,
The multiple policy informations of display compared from attributes such as region, issuing times.Specifically, including the following steps:
Step 51, subject classification label to be checked is obtained;
Step 52, the corresponding multiple policy informations of subject classification label to be checked are determined in searching database;
Step 53, the content of specified attribute is extracted in each policy information, wherein specified attribute is category to be compared
Property;
Step 54, to preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison.
Optionally, in the result to preset display mode display retrieval, it can be based on time attribute, on map one by one
Show that certain a kind of policy conducts chain in the time that diverse geographic location set up the project, each policy is a node, can be with preset
It is first usually to indicate, for example, after retrieving the agricultural policy established in a certain period, sequentially in time one by one in map
The middle each policy node of display, the position of display is the corresponding city of policy, to optimize the effect of visualization of search result.Accordingly
Ground, in the database, the storage mode of each policy are the form of chained list, and each data cell of chained list includes linked list units
ID, the city of policy, Content of policy, policy ID, policy issuing time etc..The time for storing link, is when being set up the project according to policy
Between sooner or later come it is fixed.Each chain storage of linked list is the policy information chained list for specifying investment policy field.
Below to be described in detail for establishing the embodiment of chained list based on the project verification time.
When executing step 105 and being associated with policy information with matched subject classification label and store to searching database,
Execute following steps:
Step 61, the project verification time is extracted in policy information;
Step 62, policy information is inserted into the corresponding policy information chained list of subject classification label based on the project verification time,
In, for the policy information of the corresponding subject classification label of the sequential storage for time of setting up the project in policy information chained list;
Step 63, the corresponding multiple policy informations of subject classification label to be checked are determined in searching database, are searched
The gauge outfit address of the corresponding chained list of type of theme label;
Step 54, to preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison is executed as follows
Step:
Step 64, default map template is shown;
Step 65, it is originated from the gauge outfit address of chained list, repeats following steps until indicating on default map template
Each policy information in chained list:
The policy information being currently polled in chained list is obtained, extracts project verification city, and in corresponding policy information with pre-
Bidding, which is known, is indicated in the corresponding position in default map template neutrality Xiangcheng City.
It optionally, can be to the existing data with the Policy Updates public sentiment monitoring in database, specifically, can pass through
Each news portal website crawls the comment list of the news content for specified policy, chooses a small amount of item number in comment list
Comment as sample data, by manually marking affective style again, wherein sample data is further divided into two parts, and a part is made
The training sample of the neural network model of affective style mark is carried out to comment for training, another part is as verifying collection, to test
Demonstrate,prove the effect of neural network model mark affective style.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions
It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not
The sequence being same as herein executes shown or described step.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
A kind of processing unit of policy information is additionally provided in the present embodiment, and the device is for realizing above-described embodiment 1
And its preferred embodiment, to the term or implementation not being described in detail in this present embodiment, reference can be made to mutually speaking on somebody's behalf in embodiment 1
Bright, the descriptions that have already been made will not be repeated.
Term " module " as used below, can be achieved on the combination of the software and/or hardware of predetermined function.Although
Device described in following embodiment is preferably realized with software, but the combined realization of hardware or software and hardware
And can be contemplated.
Fig. 2 is the schematic diagram of the processing unit of policy information according to an embodiment of the present invention, as shown in Fig. 2, the device packet
Include: first obtains module 10, preprocessing module 20, extraction module 30, input module 40 and memory module 50.
Wherein, the first acquisition module 10 is for obtaining the policy information crawled in multiple data sources;Pre-process mould
Block 20 obtains destination document for pre-processing to policy information, wherein includes the text in policy information in destination document
Information;Extraction module 30 is used to extract the keyword in destination document;Input module 40 is used to input the keyword extracted
First model obtains and the matched subject classification label of policy information, wherein the first model is to advance with multiple training samples
To the deep learning model being trained, each training sample is to including for the input data as deep learning model
Multiple keywords and the training objective for the output data as deep learning model at least one subject classification mark
Label;Policy information for being associated with matched subject classification label and storing to searching database by memory module 50.
Optionally, the first acquisition module includes: download unit, is answered for downloading preconfigured target from Cloud Server
Use container;Execution unit, for crawling operation for multiple data sources in performance objective application container;Extraction unit is used
Policy information in the network address that extraction crawls.
Optionally, extraction module includes: extracting unit, for based on the reverse document-frequency model extraction target text of word frequency-
Keyword in shelves;Determination unit, for utilize preset term vector corresponding relationship, determine the word of each keyword be embedded in
Amount.
Optionally, the device further include: second obtains module, for the keyword extracted to be inputted the first model,
It obtains and before the matched subject classification label of policy information, obtains for the full Connection Neural Network classification as initial model
Model;Third obtains module, for obtaining multiple training samples pair;Training module, for utilizing multiple training samples to training
Full Connection Neural Network disaggregated model, obtains the first model.
Optionally, it includes: first acquisition unit that third, which obtains module, for obtaining multiple policy documents;Pretreatment unit,
For pre-processing to each policy document, the vocabulary bag corresponding to each policy document is obtained, wherein in each vocabulary bag
Including the vocabulary in corresponding policy document;Input unit is obtained for multiple vocabulary bags input document subject matter to be generated model
Multiple themes and the corresponding multiple subject classification labels of each theme, wherein each training sample is to including for as defeated
The vocabulary bag that enters and for the multiple subject classification labels training objective as output, corresponding with vocabulary bag.
Optionally, device further include: the 4th obtains module, for by policy information and matched subject classification label
It is associated with and stores to searching database, obtain subject classification label to be checked;Determining module, in searching database
The corresponding multiple policy informations of middle determination subject classification label to be checked;5th obtains module, in each policy information
The middle content for extracting specified attribute, wherein specified attribute is attribute to be compared;Display module is used for preset display mode,
The content of the specified attribute of the multiple policy informations of display of comparison.
Optionally, memory module includes: second acquisition unit, for extracting the project verification time in policy information;Insertion is single
Member, for policy information to be inserted into the corresponding policy information chained list of subject classification label based on the project verification time, wherein policy letter
It ceases in chained list for the policy information of the corresponding subject classification label of the sequential storage for time of setting up the project;Determining module includes: to look into
Unit is looked for, for searching the gauge outfit address of the corresponding chained list of type of theme label;Display module includes: display unit, for opening up
Show default map template;Execution unit repeats following steps until default map for originating from the gauge outfit address of chained list
Each policy information in chained list is indicated in template: the policy information being currently polled in chained list is obtained, in corresponding policy
Project verification city is extracted in information, and is indicated in the corresponding position in default map template neutrality Xiangcheng City with default.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong
Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any
Combined form is located in different processors.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general
Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein
Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or
Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
Embodiment 3
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein
The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read-
Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard
The various media that can store computer program such as disk, magnetic or disk.
Embodiment 4
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory
There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method
Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device
It is connected with above-mentioned processor, which connects with above-mentioned processor.By taking electronic device is mobile terminal as an example, Fig. 3
It is a kind of hardware block diagram of mobile terminal of the embodiment of the present invention.As shown in figure 3, mobile terminal may include one or more
(processor 302 can include but is not limited to Micro-processor MCV or programmable logic to a (one is only shown in Fig. 3) processor 302
The processing unit of device FPGA etc.) and memory 304 for storing data, optionally, above-mentioned mobile terminal can also include
Transmission device 306 and input-output equipment 308 for communication function.It will appreciated by the skilled person that Fig. 3 institute
The structure shown is only to illustrate, and does not cause to limit to the structure of above-mentioned mobile terminal.For example, mobile terminal may also include than figure
More perhaps less component shown in 3 or with the configuration different from shown in Fig. 3.
Memory 304 can be used for storing computer program, for example, the software program and module of application software, such as this hair
The corresponding computer program of the recognition methods of image in bright embodiment, processor 302 are stored in memory 304 by operation
Computer program realize above-mentioned method thereby executing various function application and data processing.Memory 304 can wrap
Include high speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or
Other non-volatile solid state memories.In some instances, memory 304 can further comprise long-range relative to processor 302
The memory of setting, these remote memories can pass through network connection to mobile terminal.The example of above-mentioned network includes but not
It is limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 306 is used to that data to be received or sent via a network.Above-mentioned network specific example may include
The wireless network that the communication providers of mobile terminal provide.In an example, transmitting device 306 includes a network adapter
(Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments so as to
It is communicated with internet.In an example, transmitting device 306 can be radio frequency (Radio Frequency, referred to as RF)
Module is used to wirelessly be communicated with internet.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc.
With replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of processing method of policy information, which is characterized in that the described method includes:
Obtain the policy information crawled in multiple data sources;
The policy information is pre-processed, destination document is obtained, wherein includes the policy information in the destination document
In text information;
Extract the keyword in the destination document;
The keyword extracted is inputted into the first model, is obtained and the matched subject classification label of the policy information, wherein institute
Stating the first model is to advance with multiple training samples to the deep learning model being trained, each training sample
To include for the input data as the deep learning model multiple keywords and for be used as the deep learning
At least one subject classification label of the training objective of the output data of model;
The policy information is associated with matched subject classification label and is stored to searching database.
2. the method according to claim 1, wherein described obtain the policy crawled in multiple data sources
Information, comprising:
Preconfigured target application container is downloaded from Cloud Server;
It executes in the target application container and crawls operation for the multiple data source;
Extract the policy information in the network address crawled.
3. the method according to claim 1, wherein the keyword extracted in the destination document, comprising:
Based on the keyword in destination document described in the reverse document-frequency model extraction of word frequency-;
Using preset term vector corresponding relationship, the word insertion vector of each keyword is determined.
4. the method according to claim 1, wherein being obtained the keyword extracted is inputted the first model
Before the matched subject classification label of the policy information, the method also includes:
It obtains for the full Connection Neural Network disaggregated model as initial model;
Obtain multiple training samples pair;
Using multiple training samples full Connection Neural Network disaggregated model described training, the first model is obtained.
5. according to the method described in claim 4, it is characterized in that, described obtain multiple training samples pair, comprising:
Obtain multiple policy documents;
Each policy document is pre-processed, the vocabulary bag corresponding to each policy document is obtained, wherein is each
It include the vocabulary in corresponding policy document in the vocabulary bag;
Multiple vocabulary bag input document subject matters are generated into model, multiple themes is obtained and each theme is corresponding
Multiple subject classification labels, wherein each training sample is to including being used for the vocabulary bag as input and use
In the multiple subject classification labels training objective as output, corresponding with the vocabulary bag.
6. the method according to claim 1, wherein by the policy information and matched subject classification label
It is associated with and stores to searching database, the method also includes:
Obtain subject classification label to be checked;
The corresponding multiple policy informations of the subject classification label to be checked are determined in the searching database;
The content of specified attribute is extracted in each policy information, wherein the specified attribute is attribute to be compared;
To preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison.
7. according to the method described in claim 6, it is characterized in that,
It is described that the policy information is associated with matched subject classification label and is stored to searching database, comprising: described
The project verification time is extracted in policy information;The policy information subject classification label is inserted into based on the project verification time to correspond to
Policy information chained list in, wherein in the policy information chained list for it is described project verification the time the corresponding master of sequential storage
Inscribe the policy information of tag along sort;
It is described that the corresponding multiple policy informations of the subject classification label to be checked are determined in the searching database,
It include: the gauge outfit address for searching the corresponding chained list of the type of theme label;
It is described to preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison, comprising: exhibition
Show default map template;It is originated from the gauge outfit address of the chained list, repeats following steps until the default map template
On indicate each of described chained list policy information: obtain the policy information being currently polled in the chained list,
Project verification city is extracted in the corresponding policy information, and is stood with default be indicated in described in the default map template
The corresponding position in Xiangcheng City.
8. a kind of processing unit of policy information, which is characterized in that described device includes:
Module is obtained, for obtaining the policy information crawled in multiple data sources;
Preprocessing module obtains destination document for pre-processing to the policy information, wherein in the destination document
Including the text information in the policy information;
Extraction module, for extracting the keyword in the destination document;
Input module obtains and the matched theme of the policy information point for the keyword extracted to be inputted the first model
Class label, wherein first model is to advance with multiple training samples to the deep learning model being trained, often
A training sample is to including for multiple keywords of the input data as the deep learning model and for making
For at least one subject classification label of the training objective of the output data of the deep learning model;
Memory module, for being associated with and storing to searching database with matched subject classification label by the policy information.
9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer
Program is arranged to perform claim when operation and requires method described in 1 to 7 any one.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to run the computer program in method described in perform claim 1 to 7 any one of requirement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910390294.6A CN110275935A (en) | 2019-05-10 | 2019-05-10 | Processing method, device and storage medium, the electronic device of policy information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910390294.6A CN110275935A (en) | 2019-05-10 | 2019-05-10 | Processing method, device and storage medium, the electronic device of policy information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110275935A true CN110275935A (en) | 2019-09-24 |
Family
ID=67959048
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910390294.6A Pending CN110275935A (en) | 2019-05-10 | 2019-05-10 | Processing method, device and storage medium, the electronic device of policy information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110275935A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866116A (en) * | 2019-10-25 | 2020-03-06 | 远光软件股份有限公司 | Policy document processing method and device, storage medium and electronic equipment |
CN110909122A (en) * | 2019-10-10 | 2020-03-24 | 重庆金融资产交易所有限责任公司 | Information processing method and related equipment |
CN111046225A (en) * | 2019-12-20 | 2020-04-21 | 网易(杭州)网络有限公司 | Audio resource processing method, device, equipment and storage medium |
CN111126879A (en) * | 2019-12-31 | 2020-05-08 | 厦门美契信息技术有限公司 | Green financial item selection evaluation method |
CN111177794A (en) * | 2019-12-10 | 2020-05-19 | 平安医疗健康管理股份有限公司 | City image method, device, computer equipment and storage medium |
CN111241110A (en) * | 2020-02-03 | 2020-06-05 | 广州欧赛斯信息科技有限公司 | Data management method based on job education diagnosis and modification platform |
CN111326142A (en) * | 2020-01-21 | 2020-06-23 | 青梧桐有限责任公司 | Text information extraction method and system based on voice-to-text and electronic equipment |
CN111400369A (en) * | 2020-03-06 | 2020-07-10 | 湖南城市学院 | Big data analysis-based policy information service system and method |
CN111475647A (en) * | 2020-03-19 | 2020-07-31 | 平安国际智慧城市科技股份有限公司 | Document processing method and device and server |
CN111506628A (en) * | 2020-04-22 | 2020-08-07 | 中国民航信息网络股份有限公司 | Data processing method and device |
CN111652524A (en) * | 2020-06-11 | 2020-09-11 | 中力数创(重庆)科技有限公司 | Method and device for intelligently matching policy and guiding improvement path |
CN112052305A (en) * | 2020-09-02 | 2020-12-08 | 平安资产管理有限责任公司 | Information extraction method and device, computer equipment and readable storage medium |
CN112131385A (en) * | 2020-09-15 | 2020-12-25 | 天津大学 | Structure analysis method of privacy policy |
CN112307210A (en) * | 2020-11-06 | 2021-02-02 | 中冶赛迪工程技术股份有限公司 | Document tag prediction method, system, medium and electronic device |
CN112541352A (en) * | 2020-12-23 | 2021-03-23 | 上海永骁智能技术有限公司 | Policy interpretation method based on deep learning |
CN112765338A (en) * | 2020-12-30 | 2021-05-07 | 江苏风云科技服务有限公司 | Policy data pushing method, policy calculator and computer equipment |
CN112906382A (en) * | 2021-02-05 | 2021-06-04 | 山东省计算中心(国家超级计算济南中心) | Policy text multi-label labeling method and system based on graph neural network |
CN112995243A (en) * | 2019-12-02 | 2021-06-18 | 重庆市科学技术研究院 | Big data-based policy information pushing method and system |
CN113469645A (en) * | 2021-06-21 | 2021-10-01 | 广州政企互联科技有限公司 | Intelligent storage method for policy data |
CN113723737A (en) * | 2021-05-11 | 2021-11-30 | 天元大数据信用管理有限公司 | Enterprise portrait-based policy matching method, device, equipment and medium |
CN114510566A (en) * | 2021-11-29 | 2022-05-17 | 上海市黄浦区城市运行管理中心(上海市黄浦区城市网格化综合管理中心、上海市黄浦区大数据中心) | Hot word mining, classifying and analyzing method and system based on work order |
CN114722801A (en) * | 2020-12-22 | 2022-07-08 | 航天信息股份有限公司 | Government affair data classification storage method and related device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014063295A (en) * | 2012-09-20 | 2014-04-10 | Cybernet Systems Co Ltd | Context analyzing apparatus, information sorting apparatus and information classification system |
CN106649875A (en) * | 2017-01-04 | 2017-05-10 | 成都四方伟业软件股份有限公司 | Visualization system of public opinion big data |
US20170169103A1 (en) * | 2015-12-10 | 2017-06-15 | Agile Data Decisions LLC | Method and system for extracting, verifying and cataloging technical information from unstructured documents |
CN108491438A (en) * | 2018-02-12 | 2018-09-04 | 陆夏根 | A kind of technology policy retrieval analysis method |
CN109033358A (en) * | 2018-07-26 | 2018-12-18 | 李辰洋 | News Aggreagation and the associated method of intelligent entity |
-
2019
- 2019-05-10 CN CN201910390294.6A patent/CN110275935A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014063295A (en) * | 2012-09-20 | 2014-04-10 | Cybernet Systems Co Ltd | Context analyzing apparatus, information sorting apparatus and information classification system |
US20170169103A1 (en) * | 2015-12-10 | 2017-06-15 | Agile Data Decisions LLC | Method and system for extracting, verifying and cataloging technical information from unstructured documents |
CN106649875A (en) * | 2017-01-04 | 2017-05-10 | 成都四方伟业软件股份有限公司 | Visualization system of public opinion big data |
CN108491438A (en) * | 2018-02-12 | 2018-09-04 | 陆夏根 | A kind of technology policy retrieval analysis method |
CN109033358A (en) * | 2018-07-26 | 2018-12-18 | 李辰洋 | News Aggreagation and the associated method of intelligent entity |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909122A (en) * | 2019-10-10 | 2020-03-24 | 重庆金融资产交易所有限责任公司 | Information processing method and related equipment |
CN110909122B (en) * | 2019-10-10 | 2023-10-03 | 湖北华中电力科技开发有限责任公司 | Information processing method and related equipment |
CN110866116A (en) * | 2019-10-25 | 2020-03-06 | 远光软件股份有限公司 | Policy document processing method and device, storage medium and electronic equipment |
CN112995243A (en) * | 2019-12-02 | 2021-06-18 | 重庆市科学技术研究院 | Big data-based policy information pushing method and system |
CN111177794A (en) * | 2019-12-10 | 2020-05-19 | 平安医疗健康管理股份有限公司 | City image method, device, computer equipment and storage medium |
CN111046225B (en) * | 2019-12-20 | 2024-01-26 | 网易(杭州)网络有限公司 | Audio resource processing method, device, equipment and storage medium |
CN111046225A (en) * | 2019-12-20 | 2020-04-21 | 网易(杭州)网络有限公司 | Audio resource processing method, device, equipment and storage medium |
CN111126879A (en) * | 2019-12-31 | 2020-05-08 | 厦门美契信息技术有限公司 | Green financial item selection evaluation method |
CN111126879B (en) * | 2019-12-31 | 2024-05-31 | 厦门美契信息技术有限公司 | Green melt item selection evaluation method |
CN111326142A (en) * | 2020-01-21 | 2020-06-23 | 青梧桐有限责任公司 | Text information extraction method and system based on voice-to-text and electronic equipment |
CN111241110A (en) * | 2020-02-03 | 2020-06-05 | 广州欧赛斯信息科技有限公司 | Data management method based on job education diagnosis and modification platform |
CN111241110B (en) * | 2020-02-03 | 2023-06-06 | 广州欧赛斯信息科技有限公司 | Data management method based on staff and education diagnosis and improvement platform |
CN111400369A (en) * | 2020-03-06 | 2020-07-10 | 湖南城市学院 | Big data analysis-based policy information service system and method |
CN111475647A (en) * | 2020-03-19 | 2020-07-31 | 平安国际智慧城市科技股份有限公司 | Document processing method and device and server |
CN111506628A (en) * | 2020-04-22 | 2020-08-07 | 中国民航信息网络股份有限公司 | Data processing method and device |
CN111652524A (en) * | 2020-06-11 | 2020-09-11 | 中力数创(重庆)科技有限公司 | Method and device for intelligently matching policy and guiding improvement path |
CN112052305A (en) * | 2020-09-02 | 2020-12-08 | 平安资产管理有限责任公司 | Information extraction method and device, computer equipment and readable storage medium |
CN112131385A (en) * | 2020-09-15 | 2020-12-25 | 天津大学 | Structure analysis method of privacy policy |
CN112307210A (en) * | 2020-11-06 | 2021-02-02 | 中冶赛迪工程技术股份有限公司 | Document tag prediction method, system, medium and electronic device |
CN112307210B (en) * | 2020-11-06 | 2024-07-30 | 中冶赛迪工程技术股份有限公司 | Document tag prediction method, system, medium and electronic device |
CN114722801A (en) * | 2020-12-22 | 2022-07-08 | 航天信息股份有限公司 | Government affair data classification storage method and related device |
CN112541352A (en) * | 2020-12-23 | 2021-03-23 | 上海永骁智能技术有限公司 | Policy interpretation method based on deep learning |
CN112765338A (en) * | 2020-12-30 | 2021-05-07 | 江苏风云科技服务有限公司 | Policy data pushing method, policy calculator and computer equipment |
CN112906382A (en) * | 2021-02-05 | 2021-06-04 | 山东省计算中心(国家超级计算济南中心) | Policy text multi-label labeling method and system based on graph neural network |
CN113723737A (en) * | 2021-05-11 | 2021-11-30 | 天元大数据信用管理有限公司 | Enterprise portrait-based policy matching method, device, equipment and medium |
CN113469645A (en) * | 2021-06-21 | 2021-10-01 | 广州政企互联科技有限公司 | Intelligent storage method for policy data |
CN114510566A (en) * | 2021-11-29 | 2022-05-17 | 上海市黄浦区城市运行管理中心(上海市黄浦区城市网格化综合管理中心、上海市黄浦区大数据中心) | Hot word mining, classifying and analyzing method and system based on work order |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110275935A (en) | Processing method, device and storage medium, the electronic device of policy information | |
CN110532451A (en) | Search method and device for policy text, storage medium, electronic device | |
US20240078386A1 (en) | Methods and systems for language-agnostic machine learning in natural language processing using feature extraction | |
US20210232761A1 (en) | Methods and systems for improving machine learning performance | |
CN110020185A (en) | Intelligent search method, terminal and server | |
CN105677931B (en) | Information search method and device | |
CN110929125B (en) | Search recall method, device, equipment and storage medium thereof | |
CN110569361A (en) | Text recognition method and equipment | |
CN110704411A (en) | Knowledge graph building method and device suitable for art field and electronic equipment | |
CN106970991B (en) | Similar application identification method and device, application search recommendation method and server | |
CN113378061B (en) | Information searching method, device, computer equipment and storage medium | |
EP2973038A1 (en) | Classifying resources using a deep network | |
CN109513211A (en) | Processing method, device and the game resource display systems of fine arts resource file | |
CN110427480B (en) | Intelligent personalized text recommendation method and device and computer readable storage medium | |
CN108491388A (en) | Data set acquisition methods, sorting technique, device, equipment and storage medium | |
CN110196936A (en) | Search method, device and the storage medium and electronic device of project | |
CN109977291A (en) | Search method, device, equipment and storage medium based on physical knowledge map | |
CN113704623B (en) | Data recommendation method, device, equipment and storage medium | |
CN108140055A (en) | Trigger application message | |
CN112783825A (en) | Data archiving method, data archiving device, computer device and storage medium | |
CN112330510A (en) | Volunteer recommendation method and device, server and computer-readable storage medium | |
CN112632264A (en) | Intelligent question and answer method and device, electronic equipment and storage medium | |
Abbasi et al. | Organizing resources on tagging systems using t-org | |
JPH08305724A (en) | Device for managing design supporting information document | |
CN110929526A (en) | Sample generation method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |