CN110275935A - Processing method, device and storage medium, the electronic device of policy information - Google Patents

Processing method, device and storage medium, the electronic device of policy information Download PDF

Info

Publication number
CN110275935A
CN110275935A CN201910390294.6A CN201910390294A CN110275935A CN 110275935 A CN110275935 A CN 110275935A CN 201910390294 A CN201910390294 A CN 201910390294A CN 110275935 A CN110275935 A CN 110275935A
Authority
CN
China
Prior art keywords
policy information
policy
model
subject classification
classification label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910390294.6A
Other languages
Chinese (zh)
Inventor
吴壮伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910390294.6A priority Critical patent/CN110275935A/en
Publication of CN110275935A publication Critical patent/CN110275935A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of processing method of policy information, device and storage mediums, electronic device, wherein this method comprises: obtaining the policy information crawled in multiple data sources;Policy information is pre-processed, destination document is obtained, wherein includes the text information in policy information in destination document;Extract the keyword in destination document;The keyword extracted is inputted into the first model, it obtains and the matched subject classification label of policy information, wherein, first model is to advance with multiple training samples to the deep learning model being trained, and each training sample is at least one the subject classification label for including training objective for multiple keywords of the input data as deep learning model and for the output data as deep learning model;Policy information is associated with matched subject classification label and is stored to searching database.Through the invention, solve the problems, such as that policy information distribution in the prior art disperses, retrieves difficulty.

Description

Processing method, device and storage medium, the electronic device of policy information
Technical field
The present invention relates to field of data retrieval, in particular to a kind of processing method of policy information, device and deposit Storage media, electronic device.
Background technique
The policy information of government is mainly distributed on the website of different governments at present, and network is to issue, check, obtaining government's letter The main approach and means of breath.But it since the type of various policies is different, issuing time is different, administrative department is different, leads Policy information is caused to disperse very much, enterprises and individuals if necessary need to spend it should be understood that the policy of demand is extremely difficult A large amount of time and efforts is found on the website of each government, can not quickly find the information of needs.
For the above problem present in the relevant technologies, at present it is not yet found that the solution of effect.
Summary of the invention
The embodiment of the invention provides a kind of processing method of policy information, device and storage medium, electronic device, with At least solve the problems, such as that policy information distribution in the prior art disperses, retrieves difficulty.
According to one embodiment of present invention, a kind of processing method of policy information is provided, comprising: obtain in multiple numbers According to the policy information crawled in source;Policy information is pre-processed, destination document is obtained, wherein is wrapped in destination document Include the text information in policy information;Extract the keyword in destination document;The keyword extracted is inputted into the first model, is obtained To with the matched subject classification label of policy information, wherein the first model is to advance with multiple training samples to being trained Obtained deep learning model, each training sample is to including multiple keys for the input data as deep learning model At least one subject classification label of word and the training objective for the output data as deep learning model;By policy Information is associated with matched subject classification label and stores to searching database.
Further, the policy information crawled in multiple data sources is obtained, comprising: download from Cloud Server pre- The target application container first configured;Operation is crawled for multiple data sources in performance objective application container;Extraction crawls Network address in policy information.
Further, the keyword in destination document is extracted, comprising: based on the reverse document-frequency model extraction mesh of word frequency- Mark the keyword in document;Using preset term vector corresponding relationship, the word insertion vector of each keyword is determined.
Further, the keyword extracted is being inputted into the first model, obtained and the matched subject classification of policy information Before label, this method further include: obtain for the full Connection Neural Network disaggregated model as initial model;Obtain multiple instructions Practice sample pair;Using multiple training samples to the full Connection Neural Network disaggregated model of training, the first model is obtained.
Further, multiple training samples pair are obtained, comprising: obtain multiple policy documents;Each policy document is carried out Pretreatment obtains the vocabulary bag corresponding to each policy document, wherein includes in corresponding policy document in each vocabulary bag Vocabulary;Multiple vocabulary bags input document subject matter is generated into model, obtains multiple themes and the corresponding multiple themes of each theme Tag along sort, wherein each training sample is to including for vocabulary bag as input and for the training objective as output , corresponding with vocabulary bag multiple subject classification labels.
Further, policy information is being associated with matched subject classification label and is being stored to searching database, This method further include: obtain subject classification label to be checked;Subject classification label to be checked is determined in searching database Corresponding multiple policy informations;The content of specified attribute is extracted in each policy information, wherein specified attribute is wait compare Attribute;To preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison.
Further, policy information is associated with matched subject classification label and is stored to searching database, comprising: The project verification time is extracted in policy information;Policy information is inserted into the corresponding policy information chain of subject classification label based on the project verification time In table, wherein for the policy information of the corresponding subject classification label of the sequential storage for time of setting up the project in policy information chained list; The corresponding multiple policy informations of subject classification label to be checked are determined in searching database, comprising: search type of theme mark Sign the gauge outfit address of corresponding chained list;To preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison, It include: to show default map template;It is originated from the gauge outfit address of chained list, repeats following steps until on default map template It indicates each policy information in chained list: the policy information being currently polled in chained list is obtained, in corresponding policy information Project verification city is extracted, and is indicated in the corresponding position in default map template neutrality Xiangcheng City with default.
According to another embodiment of the invention, a kind of processing unit of policy information is provided, comprising: the device packet Include: first obtains module, for obtaining the policy information crawled in multiple data sources;Preprocessing module, for political affairs Plan information is pre-processed, and destination document is obtained, wherein includes the text information in policy information in destination document;Extract mould Block, for extracting the keyword in destination document;Input module is obtained for the keyword extracted to be inputted the first model With the matched subject classification label of policy information, wherein the first model is to advance with multiple training samples to being trained The deep learning model arrived, each training sample is to including multiple keys for the input data as deep learning model At least one subject classification label of word and the training objective for the output data as deep learning model;Store mould Block, for being associated with and storing to searching database with matched subject classification label by policy information.
Further, the first acquisition module includes: download unit, for downloading preconfigured target from Cloud Server Application container;Execution unit, for crawling operation for multiple data sources in performance objective application container;Extraction unit, For extracting the policy information in the network address crawled.
Further, extraction module includes: extracting unit, for based on the reverse document-frequency model extraction target of word frequency- Keyword in document;Determination unit, for utilize preset term vector corresponding relationship, determine the word of each keyword be embedded in Amount.
Further, device further include: second obtains module, for the keyword extracted to be inputted the first mould Type obtains obtaining with before the matched subject classification label of policy information for the full Connection Neural Network as initial model Disaggregated model;Third obtains module, for obtaining multiple training samples pair;Training module, for utilizing multiple training samples pair The full Connection Neural Network disaggregated model of training, obtains the first model.
Further, it includes: first acquisition unit that third, which obtains module, for obtaining multiple policy documents;Pretreatment is single Member obtains the vocabulary bag corresponding to each policy document, wherein each vocabulary for pre-processing to each policy document It include the vocabulary in corresponding policy document in bag;Input unit, for multiple vocabulary bags input document subject matter to be generated model, Obtain multiple themes and the corresponding multiple subject classification labels of each theme, wherein each training sample is to including for making For the vocabulary bag of input and for the multiple subject classification labels training objective as output, corresponding with vocabulary bag.
Further, device further include: the 4th obtains module, for by policy information and matched subject classification mark Label are associated with and store to searching database, obtain subject classification label to be checked;Determining module, in retrieval data The corresponding multiple policy informations of subject classification label to be checked are determined in library;5th obtains module, for believing in each policy The content of specified attribute is extracted in breath, wherein specified attribute is attribute to be compared;Display module, for preset display side Formula, the content of the specified attribute of the multiple policy informations of display of comparison.
Further, memory module includes: second acquisition unit, for extracting the project verification time in policy information;Insertion Unit, for policy information to be inserted into the corresponding policy information chained list of subject classification label based on the project verification time, wherein policy For with the policy information of the corresponding subject classification label of the sequential storage for time of setting up the project in information chained list;Determining module includes: Searching unit, for searching the gauge outfit address of the corresponding chained list of type of theme label;Display module includes: display unit, is used for Show default map template;Execution unit repeats following steps until default ground for originating from the gauge outfit address of chained list Each policy information in chained list is indicated on artwork plate: the policy information being currently polled in chained list is obtained, in corresponding political affairs Project verification city is extracted in plan information, and is indicated in the corresponding position in default map template neutrality Xiangcheng City with default.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described Step in embodiment of the method.
Through the invention, policy information is crawled by crawling mode, obtains text information, and then by extracting keyword, The corresponding subject classification label of policy information is obtained with preset training pattern, solves political affairs in the prior art in the related technology The difficult technical problem of plan information distribution dispersion, retrieval, by integrating the policy information crawled, and utilizes master trained in advance Topic tag along sort classifies to policy information, has reached convenient for retrieving the technical effect of all policy informations with type of theme.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the processing method of policy information according to an embodiment of the present invention;
Fig. 2 is the schematic diagram of the processing unit of policy information according to an embodiment of the present invention;
Fig. 3 is a kind of hardware block diagram of mobile terminal of the embodiment of the present invention.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application a part, instead of all the embodiments, in the absence of conflict, embodiment and reality in the application The feature applied in example can be combined with each other.Based on the embodiment in the application, those of ordinary skill in the art are not making wound Every other embodiment obtained under the premise of the property made labour, shall fall within the protection scope of the present application.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
Embodiment 1
The processing method for present embodiments providing a kind of policy information, can be applied to client-side, wherein client can Among mobile terminal, handheld terminal or similar arithmetic facility in operation.Operating in different arithmetic facilities only is that scheme exists Difference in executing subject, those skilled in the art are contemplated that in nonidentity operation equipment, operation can generate identical technical effect.
The processing method of policy information provided in this embodiment,
As shown in Figure 1, the processing method of policy information provided in this embodiment includes the following steps:
The embodiment of the invention provides a kind of processing methods of policy information, include the following steps:
Step 101, the policy information crawled in multiple data sources is obtained;
Step 102, policy information is pre-processed, obtains destination document, wherein include policy information in destination document In text information;
Step 103, the keyword in destination document is extracted;
Step 104, the keyword extracted is inputted into the first model, obtained and the matched subject classification mark of policy information Label, wherein the first model is to advance with multiple training samples to the deep learning model being trained, each trained sample This is to including for multiple keywords of the input data as deep learning model and for as deep learning model At least one subject classification label of the training objective of output data;
Step 105, policy information is associated with matched subject classification label and is stored to searching database.
Policy information is crawled in a network using crawler technology, for example, crawling in a network every predetermined period Specified list of websites (such as specified bid publicity website and search engine), obtains policy information.
In order to determine the subject classification label of policy information, the text information in the network address crawled, text information are extracted It may include the information such as title and the Content of policy of policy, policy information pre-processed, destination document is obtained.Pretreatment can To include being segmented to text information (as the stammerer imported using the dictionary with the specific vocabulary for policy information is segmented Method), removal stop words, removal punctuation mark etc., obtained destination document includes the word combination of the policy information.
After obtaining destination document, keyword is extracted in destination document, extracting keyword can use word frequency-inversely Document-frequency (TF-IDF) or textRank etc. extract key word algorithm, extract the keyword in destination document.
It after extracting keyword, inputs before the first model, it is also necessary to be the input of the first model by keyword processing Format, for example, determine that the word of each keyword is embedded in vector using word2vec model (preset term vector corresponding relationship), it will All word insertion vectors input the first model.
First model is trained for that matched subject classification label can be exported according to the keyword of input in advance, first Model receive word insertion vector calculated after, can export and at least one matched subject classification mark of policy information Label, in turn, policy information are associated with matched subject classification label and is stored into searching database, so that later retrieval makes With.
Searching database is for storing multiple policy informations, for example, the title of policy information, issuing time, author, just The information such as text, classification (level relation), each information can be according to title, issuing time, author, text, classification relationship (grade Not relationship), the different attribute such as theme stored, also, passes through all properties of subject classification label and policy information Content indexing gets up, the searching motif tag along sort in searching database, and the related policy information of institute can be obtained.
Optionally, the step of policy information that acquisition crawls in multiple data sources, may include:
Step 21, preconfigured target application container is downloaded from Cloud Server;
Step 22, operation is crawled for multiple data sources in performance objective application container;
Step 23, the policy information in the network address crawled is extracted.
Target application container can be Docker container, and Docker container can be made specified by being pre-configured with code Code, which is executed, crawls operation for specified data source.Before downloading default application container in Cloud Server, will be directed to The operation that crawls of multiple data sources is packed into target application container, and by target application container storage into Cloud Server.
It since target application container storage is in Cloud Server, can download when in use, and needle in target application container Crawling for each data source is operated and can be independently executed, it is thus possible to pass through computer cluster using this feature The information of project is crawled in a network.
Specifically, target application container is downloaded from Cloud Server by each computer in computer cluster, for meter Each computer in calculation machine cluster distributes specified data source, the performance objective application container of each computer independently In for corresponding specified data source crawl operation.
After the execution of each computer crawls operation, the network address that conformity calculation machine cluster crawls, and extract all nets The information of specified data type in location, realizes to be deployed on multimachine device and crawls operation.
Optionally, the training process for obtaining the first model can use following steps:
Step 31, it obtains for the full Connection Neural Network disaggregated model as initial model;
Step 32, multiple training samples pair are obtained;
Step 33, the first model is obtained to training initial model using multiple training samples.
In this optional embodiment, the first model is the full Connection Neural Network based on deep neural network (DNN) Disaggregated model obtains be used for as the full Connection Neural Network disaggregated model of initial model and multiple training samples first, thus Using multiple training samples to training initial model, wherein the input of model is the term vector of keyword, and output is multiple (m It is a) probability of subject classification label and each subject classification label, by n (n≤m) subject classification labels of probability value maximum Subject classification label as output.
Optionally, the step of the multiple training samples pair of above-mentioned acquisition, can obtain as follows:
Step 41, multiple policy documents are obtained;
Step 42, each policy document is pre-processed, obtains the vocabulary bag corresponding to each policy document, wherein It include the vocabulary in corresponding policy document in each vocabulary bag;
Step 43, multiple vocabulary bags input document subject matter is generated into model, obtains multiple themes and each theme is corresponding Multiple subject classification labels.
Multiple policy documents can be through any way acquisition, and the embodiment of the present invention is not specifically limited in this embodiment.? After obtaining multiple policy documents, each policy document is pre-processed, pretreatment includes at least participle operation, such as stammers Participle, obtains multiple vocabulary of each policy document, forms the corresponding vocabulary bag of each policy document, each word in vocabulary bag It converges no ordinal relation, only the set of vocabulary, vocabulary bag input document subject matter is generated in model (such as LDA model), is obtained Output be the corresponding multiple themes of each vocabulary bag and the corresponding associative key of each theme and each keyword with The association probability of corresponding theme, extracts the biggish Partial key word of association probability as subject classification label, by vocabulary bag with it is right For the multiple keywords answered as a training sample pair, input of the vocabulary bag as full Connection Neural Network model is corresponding more Output of a keyword as full Connection Neural Network model.
The process for obtaining a training sample pair using above-mentioned steps is exemplified below, a policy document is obtained, The processing such as stammerer participle, removal stop words, removal punctuation mark are executed, the vocabulary bag of the policy document is obtained, vocabulary bag is defeated Enter in LDA model, obtained output includes theme " agricultural ", " economy ", each theme be corresponding with multiple keywords and its with master The association probability of topic, for example, highest two keywords of the corresponding probability of theme " agricultural " be " crops ", " farming machine ", The corresponding highest keyword of probability of theme " economy " is " consumption ", and keyword " crops ", " farming machine ", " consumption " are made For the corresponding subject classification label of vocabulary bag of the policy document, a training sample pair is obtained.
Optionally, policy information is being associated with matched subject classification label and is being stored to searching database, also The multiple policy informations of display that can be compared based on the subject classification label of user query, are shown from multiple attributes, for example, The multiple policy informations of display compared from attributes such as region, issuing times.Specifically, including the following steps:
Step 51, subject classification label to be checked is obtained;
Step 52, the corresponding multiple policy informations of subject classification label to be checked are determined in searching database;
Step 53, the content of specified attribute is extracted in each policy information, wherein specified attribute is category to be compared Property;
Step 54, to preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison.
Optionally, in the result to preset display mode display retrieval, it can be based on time attribute, on map one by one Show that certain a kind of policy conducts chain in the time that diverse geographic location set up the project, each policy is a node, can be with preset It is first usually to indicate, for example, after retrieving the agricultural policy established in a certain period, sequentially in time one by one in map The middle each policy node of display, the position of display is the corresponding city of policy, to optimize the effect of visualization of search result.Accordingly Ground, in the database, the storage mode of each policy are the form of chained list, and each data cell of chained list includes linked list units ID, the city of policy, Content of policy, policy ID, policy issuing time etc..The time for storing link, is when being set up the project according to policy Between sooner or later come it is fixed.Each chain storage of linked list is the policy information chained list for specifying investment policy field.
Below to be described in detail for establishing the embodiment of chained list based on the project verification time.
When executing step 105 and being associated with policy information with matched subject classification label and store to searching database, Execute following steps:
Step 61, the project verification time is extracted in policy information;
Step 62, policy information is inserted into the corresponding policy information chained list of subject classification label based on the project verification time, In, for the policy information of the corresponding subject classification label of the sequential storage for time of setting up the project in policy information chained list;
Step 63, the corresponding multiple policy informations of subject classification label to be checked are determined in searching database, are searched The gauge outfit address of the corresponding chained list of type of theme label;
Step 54, to preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison is executed as follows Step:
Step 64, default map template is shown;
Step 65, it is originated from the gauge outfit address of chained list, repeats following steps until indicating on default map template Each policy information in chained list:
The policy information being currently polled in chained list is obtained, extracts project verification city, and in corresponding policy information with pre- Bidding, which is known, is indicated in the corresponding position in default map template neutrality Xiangcheng City.
It optionally, can be to the existing data with the Policy Updates public sentiment monitoring in database, specifically, can pass through Each news portal website crawls the comment list of the news content for specified policy, chooses a small amount of item number in comment list Comment as sample data, by manually marking affective style again, wherein sample data is further divided into two parts, and a part is made The training sample of the neural network model of affective style mark is carried out to comment for training, another part is as verifying collection, to test Demonstrate,prove the effect of neural network model mark affective style.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not The sequence being same as herein executes shown or described step.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
A kind of processing unit of policy information is additionally provided in the present embodiment, and the device is for realizing above-described embodiment 1 And its preferred embodiment, to the term or implementation not being described in detail in this present embodiment, reference can be made to mutually speaking on somebody's behalf in embodiment 1 Bright, the descriptions that have already been made will not be repeated.
Term " module " as used below, can be achieved on the combination of the software and/or hardware of predetermined function.Although Device described in following embodiment is preferably realized with software, but the combined realization of hardware or software and hardware And can be contemplated.
Fig. 2 is the schematic diagram of the processing unit of policy information according to an embodiment of the present invention, as shown in Fig. 2, the device packet Include: first obtains module 10, preprocessing module 20, extraction module 30, input module 40 and memory module 50.
Wherein, the first acquisition module 10 is for obtaining the policy information crawled in multiple data sources;Pre-process mould Block 20 obtains destination document for pre-processing to policy information, wherein includes the text in policy information in destination document Information;Extraction module 30 is used to extract the keyword in destination document;Input module 40 is used to input the keyword extracted First model obtains and the matched subject classification label of policy information, wherein the first model is to advance with multiple training samples To the deep learning model being trained, each training sample is to including for the input data as deep learning model Multiple keywords and the training objective for the output data as deep learning model at least one subject classification mark Label;Policy information for being associated with matched subject classification label and storing to searching database by memory module 50.
Optionally, the first acquisition module includes: download unit, is answered for downloading preconfigured target from Cloud Server Use container;Execution unit, for crawling operation for multiple data sources in performance objective application container;Extraction unit is used Policy information in the network address that extraction crawls.
Optionally, extraction module includes: extracting unit, for based on the reverse document-frequency model extraction target text of word frequency- Keyword in shelves;Determination unit, for utilize preset term vector corresponding relationship, determine the word of each keyword be embedded in Amount.
Optionally, the device further include: second obtains module, for the keyword extracted to be inputted the first model, It obtains and before the matched subject classification label of policy information, obtains for the full Connection Neural Network classification as initial model Model;Third obtains module, for obtaining multiple training samples pair;Training module, for utilizing multiple training samples to training Full Connection Neural Network disaggregated model, obtains the first model.
Optionally, it includes: first acquisition unit that third, which obtains module, for obtaining multiple policy documents;Pretreatment unit, For pre-processing to each policy document, the vocabulary bag corresponding to each policy document is obtained, wherein in each vocabulary bag Including the vocabulary in corresponding policy document;Input unit is obtained for multiple vocabulary bags input document subject matter to be generated model Multiple themes and the corresponding multiple subject classification labels of each theme, wherein each training sample is to including for as defeated The vocabulary bag that enters and for the multiple subject classification labels training objective as output, corresponding with vocabulary bag.
Optionally, device further include: the 4th obtains module, for by policy information and matched subject classification label It is associated with and stores to searching database, obtain subject classification label to be checked;Determining module, in searching database The corresponding multiple policy informations of middle determination subject classification label to be checked;5th obtains module, in each policy information The middle content for extracting specified attribute, wherein specified attribute is attribute to be compared;Display module is used for preset display mode, The content of the specified attribute of the multiple policy informations of display of comparison.
Optionally, memory module includes: second acquisition unit, for extracting the project verification time in policy information;Insertion is single Member, for policy information to be inserted into the corresponding policy information chained list of subject classification label based on the project verification time, wherein policy letter It ceases in chained list for the policy information of the corresponding subject classification label of the sequential storage for time of setting up the project;Determining module includes: to look into Unit is looked for, for searching the gauge outfit address of the corresponding chained list of type of theme label;Display module includes: display unit, for opening up Show default map template;Execution unit repeats following steps until default map for originating from the gauge outfit address of chained list Each policy information in chained list is indicated in template: the policy information being currently polled in chained list is obtained, in corresponding policy Project verification city is extracted in information, and is indicated in the corresponding position in default map template neutrality Xiangcheng City with default.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any Combined form is located in different processors.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
Embodiment 3
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read- Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard The various media that can store computer program such as disk, magnetic or disk.
Embodiment 4
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device It is connected with above-mentioned processor, which connects with above-mentioned processor.By taking electronic device is mobile terminal as an example, Fig. 3 It is a kind of hardware block diagram of mobile terminal of the embodiment of the present invention.As shown in figure 3, mobile terminal may include one or more (processor 302 can include but is not limited to Micro-processor MCV or programmable logic to a (one is only shown in Fig. 3) processor 302 The processing unit of device FPGA etc.) and memory 304 for storing data, optionally, above-mentioned mobile terminal can also include Transmission device 306 and input-output equipment 308 for communication function.It will appreciated by the skilled person that Fig. 3 institute The structure shown is only to illustrate, and does not cause to limit to the structure of above-mentioned mobile terminal.For example, mobile terminal may also include than figure More perhaps less component shown in 3 or with the configuration different from shown in Fig. 3.
Memory 304 can be used for storing computer program, for example, the software program and module of application software, such as this hair The corresponding computer program of the recognition methods of image in bright embodiment, processor 302 are stored in memory 304 by operation Computer program realize above-mentioned method thereby executing various function application and data processing.Memory 304 can wrap Include high speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or Other non-volatile solid state memories.In some instances, memory 304 can further comprise long-range relative to processor 302 The memory of setting, these remote memories can pass through network connection to mobile terminal.The example of above-mentioned network includes but not It is limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 306 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of mobile terminal provide.In an example, transmitting device 306 includes a network adapter (Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments so as to It is communicated with internet.In an example, transmitting device 306 can be radio frequency (Radio Frequency, referred to as RF) Module is used to wirelessly be communicated with internet.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc. With replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of processing method of policy information, which is characterized in that the described method includes:
Obtain the policy information crawled in multiple data sources;
The policy information is pre-processed, destination document is obtained, wherein includes the policy information in the destination document In text information;
Extract the keyword in the destination document;
The keyword extracted is inputted into the first model, is obtained and the matched subject classification label of the policy information, wherein institute Stating the first model is to advance with multiple training samples to the deep learning model being trained, each training sample To include for the input data as the deep learning model multiple keywords and for be used as the deep learning At least one subject classification label of the training objective of the output data of model;
The policy information is associated with matched subject classification label and is stored to searching database.
2. the method according to claim 1, wherein described obtain the policy crawled in multiple data sources Information, comprising:
Preconfigured target application container is downloaded from Cloud Server;
It executes in the target application container and crawls operation for the multiple data source;
Extract the policy information in the network address crawled.
3. the method according to claim 1, wherein the keyword extracted in the destination document, comprising:
Based on the keyword in destination document described in the reverse document-frequency model extraction of word frequency-;
Using preset term vector corresponding relationship, the word insertion vector of each keyword is determined.
4. the method according to claim 1, wherein being obtained the keyword extracted is inputted the first model Before the matched subject classification label of the policy information, the method also includes:
It obtains for the full Connection Neural Network disaggregated model as initial model;
Obtain multiple training samples pair;
Using multiple training samples full Connection Neural Network disaggregated model described training, the first model is obtained.
5. according to the method described in claim 4, it is characterized in that, described obtain multiple training samples pair, comprising:
Obtain multiple policy documents;
Each policy document is pre-processed, the vocabulary bag corresponding to each policy document is obtained, wherein is each It include the vocabulary in corresponding policy document in the vocabulary bag;
Multiple vocabulary bag input document subject matters are generated into model, multiple themes is obtained and each theme is corresponding Multiple subject classification labels, wherein each training sample is to including being used for the vocabulary bag as input and use In the multiple subject classification labels training objective as output, corresponding with the vocabulary bag.
6. the method according to claim 1, wherein by the policy information and matched subject classification label It is associated with and stores to searching database, the method also includes:
Obtain subject classification label to be checked;
The corresponding multiple policy informations of the subject classification label to be checked are determined in the searching database;
The content of specified attribute is extracted in each policy information, wherein the specified attribute is attribute to be compared;
To preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison.
7. according to the method described in claim 6, it is characterized in that,
It is described that the policy information is associated with matched subject classification label and is stored to searching database, comprising: described The project verification time is extracted in policy information;The policy information subject classification label is inserted into based on the project verification time to correspond to Policy information chained list in, wherein in the policy information chained list for it is described project verification the time the corresponding master of sequential storage Inscribe the policy information of tag along sort;
It is described that the corresponding multiple policy informations of the subject classification label to be checked are determined in the searching database, It include: the gauge outfit address for searching the corresponding chained list of the type of theme label;
It is described to preset display mode, the content of the specified attribute of the multiple policy informations of display of comparison, comprising: exhibition Show default map template;It is originated from the gauge outfit address of the chained list, repeats following steps until the default map template On indicate each of described chained list policy information: obtain the policy information being currently polled in the chained list, Project verification city is extracted in the corresponding policy information, and is stood with default be indicated in described in the default map template The corresponding position in Xiangcheng City.
8. a kind of processing unit of policy information, which is characterized in that described device includes:
Module is obtained, for obtaining the policy information crawled in multiple data sources;
Preprocessing module obtains destination document for pre-processing to the policy information, wherein in the destination document Including the text information in the policy information;
Extraction module, for extracting the keyword in the destination document;
Input module obtains and the matched theme of the policy information point for the keyword extracted to be inputted the first model Class label, wherein first model is to advance with multiple training samples to the deep learning model being trained, often A training sample is to including for multiple keywords of the input data as the deep learning model and for making For at least one subject classification label of the training objective of the output data of the deep learning model;
Memory module, for being associated with and storing to searching database with matched subject classification label by the policy information.
9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to perform claim when operation and requires method described in 1 to 7 any one.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to run the computer program in method described in perform claim 1 to 7 any one of requirement.
CN201910390294.6A 2019-05-10 2019-05-10 Processing method, device and storage medium, the electronic device of policy information Pending CN110275935A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910390294.6A CN110275935A (en) 2019-05-10 2019-05-10 Processing method, device and storage medium, the electronic device of policy information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910390294.6A CN110275935A (en) 2019-05-10 2019-05-10 Processing method, device and storage medium, the electronic device of policy information

Publications (1)

Publication Number Publication Date
CN110275935A true CN110275935A (en) 2019-09-24

Family

ID=67959048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910390294.6A Pending CN110275935A (en) 2019-05-10 2019-05-10 Processing method, device and storage medium, the electronic device of policy information

Country Status (1)

Country Link
CN (1) CN110275935A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866116A (en) * 2019-10-25 2020-03-06 远光软件股份有限公司 Policy document processing method and device, storage medium and electronic equipment
CN110909122A (en) * 2019-10-10 2020-03-24 重庆金融资产交易所有限责任公司 Information processing method and related equipment
CN111046225A (en) * 2019-12-20 2020-04-21 网易(杭州)网络有限公司 Audio resource processing method, device, equipment and storage medium
CN111126879A (en) * 2019-12-31 2020-05-08 厦门美契信息技术有限公司 Green financial item selection evaluation method
CN111177794A (en) * 2019-12-10 2020-05-19 平安医疗健康管理股份有限公司 City image method, device, computer equipment and storage medium
CN111241110A (en) * 2020-02-03 2020-06-05 广州欧赛斯信息科技有限公司 Data management method based on job education diagnosis and modification platform
CN111326142A (en) * 2020-01-21 2020-06-23 青梧桐有限责任公司 Text information extraction method and system based on voice-to-text and electronic equipment
CN111400369A (en) * 2020-03-06 2020-07-10 湖南城市学院 Big data analysis-based policy information service system and method
CN111475647A (en) * 2020-03-19 2020-07-31 平安国际智慧城市科技股份有限公司 Document processing method and device and server
CN111506628A (en) * 2020-04-22 2020-08-07 中国民航信息网络股份有限公司 Data processing method and device
CN111652524A (en) * 2020-06-11 2020-09-11 中力数创(重庆)科技有限公司 Method and device for intelligently matching policy and guiding improvement path
CN112052305A (en) * 2020-09-02 2020-12-08 平安资产管理有限责任公司 Information extraction method and device, computer equipment and readable storage medium
CN112131385A (en) * 2020-09-15 2020-12-25 天津大学 Structure analysis method of privacy policy
CN112307210A (en) * 2020-11-06 2021-02-02 中冶赛迪工程技术股份有限公司 Document tag prediction method, system, medium and electronic device
CN112541352A (en) * 2020-12-23 2021-03-23 上海永骁智能技术有限公司 Policy interpretation method based on deep learning
CN112765338A (en) * 2020-12-30 2021-05-07 江苏风云科技服务有限公司 Policy data pushing method, policy calculator and computer equipment
CN112906382A (en) * 2021-02-05 2021-06-04 山东省计算中心(国家超级计算济南中心) Policy text multi-label labeling method and system based on graph neural network
CN112995243A (en) * 2019-12-02 2021-06-18 重庆市科学技术研究院 Big data-based policy information pushing method and system
CN113469645A (en) * 2021-06-21 2021-10-01 广州政企互联科技有限公司 Intelligent storage method for policy data
CN113723737A (en) * 2021-05-11 2021-11-30 天元大数据信用管理有限公司 Enterprise portrait-based policy matching method, device, equipment and medium
CN114510566A (en) * 2021-11-29 2022-05-17 上海市黄浦区城市运行管理中心(上海市黄浦区城市网格化综合管理中心、上海市黄浦区大数据中心) Hot word mining, classifying and analyzing method and system based on work order
CN114722801A (en) * 2020-12-22 2022-07-08 航天信息股份有限公司 Government affair data classification storage method and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014063295A (en) * 2012-09-20 2014-04-10 Cybernet Systems Co Ltd Context analyzing apparatus, information sorting apparatus and information classification system
CN106649875A (en) * 2017-01-04 2017-05-10 成都四方伟业软件股份有限公司 Visualization system of public opinion big data
US20170169103A1 (en) * 2015-12-10 2017-06-15 Agile Data Decisions LLC Method and system for extracting, verifying and cataloging technical information from unstructured documents
CN108491438A (en) * 2018-02-12 2018-09-04 陆夏根 A kind of technology policy retrieval analysis method
CN109033358A (en) * 2018-07-26 2018-12-18 李辰洋 News Aggreagation and the associated method of intelligent entity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014063295A (en) * 2012-09-20 2014-04-10 Cybernet Systems Co Ltd Context analyzing apparatus, information sorting apparatus and information classification system
US20170169103A1 (en) * 2015-12-10 2017-06-15 Agile Data Decisions LLC Method and system for extracting, verifying and cataloging technical information from unstructured documents
CN106649875A (en) * 2017-01-04 2017-05-10 成都四方伟业软件股份有限公司 Visualization system of public opinion big data
CN108491438A (en) * 2018-02-12 2018-09-04 陆夏根 A kind of technology policy retrieval analysis method
CN109033358A (en) * 2018-07-26 2018-12-18 李辰洋 News Aggreagation and the associated method of intelligent entity

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909122A (en) * 2019-10-10 2020-03-24 重庆金融资产交易所有限责任公司 Information processing method and related equipment
CN110909122B (en) * 2019-10-10 2023-10-03 湖北华中电力科技开发有限责任公司 Information processing method and related equipment
CN110866116A (en) * 2019-10-25 2020-03-06 远光软件股份有限公司 Policy document processing method and device, storage medium and electronic equipment
CN112995243A (en) * 2019-12-02 2021-06-18 重庆市科学技术研究院 Big data-based policy information pushing method and system
CN111177794A (en) * 2019-12-10 2020-05-19 平安医疗健康管理股份有限公司 City image method, device, computer equipment and storage medium
CN111046225B (en) * 2019-12-20 2024-01-26 网易(杭州)网络有限公司 Audio resource processing method, device, equipment and storage medium
CN111046225A (en) * 2019-12-20 2020-04-21 网易(杭州)网络有限公司 Audio resource processing method, device, equipment and storage medium
CN111126879A (en) * 2019-12-31 2020-05-08 厦门美契信息技术有限公司 Green financial item selection evaluation method
CN111126879B (en) * 2019-12-31 2024-05-31 厦门美契信息技术有限公司 Green melt item selection evaluation method
CN111326142A (en) * 2020-01-21 2020-06-23 青梧桐有限责任公司 Text information extraction method and system based on voice-to-text and electronic equipment
CN111241110A (en) * 2020-02-03 2020-06-05 广州欧赛斯信息科技有限公司 Data management method based on job education diagnosis and modification platform
CN111241110B (en) * 2020-02-03 2023-06-06 广州欧赛斯信息科技有限公司 Data management method based on staff and education diagnosis and improvement platform
CN111400369A (en) * 2020-03-06 2020-07-10 湖南城市学院 Big data analysis-based policy information service system and method
CN111475647A (en) * 2020-03-19 2020-07-31 平安国际智慧城市科技股份有限公司 Document processing method and device and server
CN111506628A (en) * 2020-04-22 2020-08-07 中国民航信息网络股份有限公司 Data processing method and device
CN111652524A (en) * 2020-06-11 2020-09-11 中力数创(重庆)科技有限公司 Method and device for intelligently matching policy and guiding improvement path
CN112052305A (en) * 2020-09-02 2020-12-08 平安资产管理有限责任公司 Information extraction method and device, computer equipment and readable storage medium
CN112131385A (en) * 2020-09-15 2020-12-25 天津大学 Structure analysis method of privacy policy
CN112307210A (en) * 2020-11-06 2021-02-02 中冶赛迪工程技术股份有限公司 Document tag prediction method, system, medium and electronic device
CN112307210B (en) * 2020-11-06 2024-07-30 中冶赛迪工程技术股份有限公司 Document tag prediction method, system, medium and electronic device
CN114722801A (en) * 2020-12-22 2022-07-08 航天信息股份有限公司 Government affair data classification storage method and related device
CN112541352A (en) * 2020-12-23 2021-03-23 上海永骁智能技术有限公司 Policy interpretation method based on deep learning
CN112765338A (en) * 2020-12-30 2021-05-07 江苏风云科技服务有限公司 Policy data pushing method, policy calculator and computer equipment
CN112906382A (en) * 2021-02-05 2021-06-04 山东省计算中心(国家超级计算济南中心) Policy text multi-label labeling method and system based on graph neural network
CN113723737A (en) * 2021-05-11 2021-11-30 天元大数据信用管理有限公司 Enterprise portrait-based policy matching method, device, equipment and medium
CN113469645A (en) * 2021-06-21 2021-10-01 广州政企互联科技有限公司 Intelligent storage method for policy data
CN114510566A (en) * 2021-11-29 2022-05-17 上海市黄浦区城市运行管理中心(上海市黄浦区城市网格化综合管理中心、上海市黄浦区大数据中心) Hot word mining, classifying and analyzing method and system based on work order

Similar Documents

Publication Publication Date Title
CN110275935A (en) Processing method, device and storage medium, the electronic device of policy information
CN110532451A (en) Search method and device for policy text, storage medium, electronic device
US20240078386A1 (en) Methods and systems for language-agnostic machine learning in natural language processing using feature extraction
US20210232761A1 (en) Methods and systems for improving machine learning performance
CN110020185A (en) Intelligent search method, terminal and server
CN105677931B (en) Information search method and device
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN110569361A (en) Text recognition method and equipment
CN110704411A (en) Knowledge graph building method and device suitable for art field and electronic equipment
CN106970991B (en) Similar application identification method and device, application search recommendation method and server
CN113378061B (en) Information searching method, device, computer equipment and storage medium
EP2973038A1 (en) Classifying resources using a deep network
CN109513211A (en) Processing method, device and the game resource display systems of fine arts resource file
CN110427480B (en) Intelligent personalized text recommendation method and device and computer readable storage medium
CN108491388A (en) Data set acquisition methods, sorting technique, device, equipment and storage medium
CN110196936A (en) Search method, device and the storage medium and electronic device of project
CN109977291A (en) Search method, device, equipment and storage medium based on physical knowledge map
CN113704623B (en) Data recommendation method, device, equipment and storage medium
CN108140055A (en) Trigger application message
CN112783825A (en) Data archiving method, data archiving device, computer device and storage medium
CN112330510A (en) Volunteer recommendation method and device, server and computer-readable storage medium
CN112632264A (en) Intelligent question and answer method and device, electronic equipment and storage medium
Abbasi et al. Organizing resources on tagging systems using t-org
JPH08305724A (en) Device for managing design supporting information document
CN110929526A (en) Sample generation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination