CN108984694B - The processing method and processing device of webpage, storage medium, electronic device - Google Patents
The processing method and processing device of webpage, storage medium, electronic device Download PDFInfo
- Publication number
- CN108984694B CN108984694B CN201810725738.2A CN201810725738A CN108984694B CN 108984694 B CN108984694 B CN 108984694B CN 201810725738 A CN201810725738 A CN 201810725738A CN 108984694 B CN108984694 B CN 108984694B
- Authority
- CN
- China
- Prior art keywords
- webpage
- individual
- value
- language
- population
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention provides a kind of processing method and processing device of webpage, storage medium, electronic devices, wherein this method comprises: obtaining in training sample, there are the text attribute values of the webpage of first language;By the first parameter value perceptually device neural network input variable with determine be used to indicate webpage whether be the text based on first language third parameter value;The adaptive value of population at individual in perceptron neural network is determined according to the second parameter value and third parameter value;The individual optimal to adaptive value in population is decoded to obtain the connection weight of perceptron neural network and bias;Based on connection weight and bias determine webpage to be processed whether based on first language text.Through the invention, it solves set in advance for extracting the characteristics of parameter of webpage is rule of thumb with structure of web page in the related technology, therefore can achieve the effect that improve user experience due to the problem of the inaccuracy of the improper extraction for leading to web page text of parameter setting.
Description
Technical field
The present invention relates to the communications fields, in particular to a kind of processing method and processing device of webpage, storage medium, electricity
Sub-device.
Background technique
In the scheme for the extraction webpage text content that the prior art provides, the content statistics for being all based on web page tag are inferred
Web page contents whether based on content, the technology and the parameter manually set have it is very big be associated with, need rule of thumb to difference
Webpage set different ginsengs.Webpage after loaded, the content in webpage is split, then by browsing in a browser
Matching rule file in device positions web page contents, extracts required field contents and shows, thus user
It can be seen that the webpage after text screening, allows users to convenient and absorbed reading.
Chinese web page text body content extraction be considered as a classification problem, i.e., to webpage text content whether based on
Hold in vivo and classify, is extracted according to content based on classification results.But the prior art and the parameter manually set have very
Big association needs rule of thumb to set different webpages different parameters, these methods are very high to the setting requirements of parameter,
If parameter setting is improper, web page text extracts inaccuracy.
In view of the above problems in the related art, not yet there is effective solution at present.
Summary of the invention
The embodiment of the invention provides a kind of processing method and processing device of webpage, storage medium, electronic devices, at least to solve
It is certainly set in advance for extracting the characteristics of parameter of webpage is rule of thumb with structure of web page in the related technology, therefore can be due to
The problem of the inaccuracy of the improper extraction for leading to web page text of parameter setting.
According to one embodiment of present invention, a kind of processing method of webpage is provided, comprising: obtain and deposit in training sample
In the text attribute value of the webpage of first language, wherein the text attribute value includes: to be used to indicate in the webpage and institute
State corresponding first parameter value of first language, be used to indicate the webpage whether based on first language text the second parameter
Value;By first parameter value perceptually device neural network input variable with determine be used to indicate the webpage whether be with
The third parameter value of text based on first language;The perception is determined according to second parameter value and the third parameter value
The adaptive value of population at individual in device neural network;The individual optimal to adaptive value in the population is decoded to obtain the perception
The connection weight and bias of device neural network;Determine webpage to be processed whether with first based on the connection weight and bias
Text based on language.
According to another embodiment of the invention, a kind of processing unit of webpage is provided, comprising: first obtains module,
For obtaining the text attribute value in training sample there are the webpage of first language, wherein the text attribute value includes: to be used for
It indicates the first parameter value corresponding with the first language in the webpage, whether be used to indicate the webpage with first language
Based on text the second parameter value;First determining module, for by first parameter value perceptually device neural network
Input variable determine be used to indicate the webpage whether be the text based on first language third parameter value;Second determines mould
Block, for determining the suitable of population at individual in the perceptron neural network according to second parameter value and the third parameter value
It should be worth;Decoder module, for being decoded to obtain the perceptron neural network to the optimal individual of adaptive value in the population
Connection weight and bias;Third determining module, for determining that webpage to be processed is based on the connection weight and bias
The no text based on first language.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium
Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described
Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described
Step in embodiment of the method.
Through the invention, by the text attribute value in acquisition training sample there are the webpage of first language, based on the determination
The adaptive value of population at individual in perceptron neural network, and then determine the connection weight and bias of perceptron neural network, from
And in pending web page text, which can be determined by the connection weight and bias of the perceptron neural network
Page text whether based on first language text, it is seen then that for the determination of web page body text do not need according in advance setting
Parameter determines, but determines the main text of webpage by the perceptron neural network of training, to solve related skill
It is set in advance for extracting the characteristics of parameter of webpage is rule of thumb with structure of web page in art, therefore can be set due to parameter
The problem of the inaccuracy for the improper extraction for leading to web page text set has achieved the effect that improve user experience.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the hardware block diagram of the terminal of the processing method of the webpage of the embodiment of the present invention;
Fig. 2 is the processing method flow chart of webpage according to an embodiment of the present invention;
Fig. 3 is the structural block diagram of the processing unit of webpage according to an embodiment of the present invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Embodiment 1
Embodiment of the method provided by the embodiment of the present application one can be in mobile terminal, terminal or similar fortune
It calculates and is executed in device.For running on mobile terminals, Fig. 1 is a kind of end of the processing method of webpage of the embodiment of the present invention
The hardware block diagram at end.As shown in Figure 1, mobile terminal 10 may include one or more (only showing one in Fig. 1) processing
Device 102 (processing unit that processor 102 can include but is not limited to Micro-processor MCV or programmable logic device FPGA etc.) and
Memory 104 for storing data, optionally, above-mentioned mobile terminal can also include the transmission device for communication function
106 and input-output equipment 108.It will appreciated by the skilled person that structure shown in FIG. 1 is only to illustrate, simultaneously
The structure of above-mentioned mobile terminal is not caused to limit.For example, mobile terminal 10 may also include it is more than shown in Fig. 1 or less
Component, or with the configuration different from shown in Fig. 1.
Memory 104 can be used for storing computer program, for example, the software program and module of application software, such as this hair
The corresponding computer program of the processing method of webpage in bright embodiment, processor 102 are stored in memory 104 by operation
Computer program realize above-mentioned method thereby executing various function application and data processing.Memory 104 can wrap
Include high speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or
Other non-volatile solid state memories.In some instances, memory 104 can further comprise long-range relative to processor 102
The memory of setting, these remote memories can pass through network connection to mobile terminal 10.The example of above-mentioned network include but
It is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include
The wireless network that the communication providers of mobile terminal 10 provide.In an example, transmitting device 106 includes a Network adaptation
Device (Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments to
It can be communicated with internet.In an example, transmitting device 106 can for radio frequency (Radio Frequency, referred to as
RF) module is used to wirelessly be communicated with internet.
A kind of processing method of webpage is provided in the present embodiment, and Fig. 2 is the place of webpage according to an embodiment of the present invention
Method flow diagram is managed, as shown in Fig. 2, the process includes the following steps:
Step S202, obtain training sample in there are the text attribute values of the webpage of first language, wherein text attribute value
Include: be used to indicate in webpage the first parameter value corresponding with first language, be used to indicate webpage whether with first language be
Second parameter value of main text;
The input variable of first parameter value perceptually device neural network is used to indicate webpage and is by step S204 to determine
The no third parameter value for the text based on first language;
Step S206 determines the adaptation of population at individual in perceptron neural network according to the second parameter value and third parameter value
Value;
Step S208, the individual optimal to adaptive value in population are decoded to obtain the connection weight of perceptron neural network
And bias;
Step S210, based on connection weight and bias determine webpage to be processed whether based on first language text.
S102 to step S110 through the above steps, obtain training sample in there are the text attributes of the webpage of first language
Value based on the adaptive value of population at individual in the determination perceptron neural network, and then determines the connection weight of perceptron neural network
Value and bias, so that the perceptron neural network can be passed through in pending web page text (new web page text)
Connection weight and bias determine the new web page text whether based on first language text, it is seen then that for web page body
The determination of text does not need to be determined according to prior setting parameter, but determines webpage by trained perceptron neural network
Main text, so that solving the parameter in the related technology for extracting webpage is rule of thumb and the characteristics of structure of web page
It is set in advance, therefore can reach and mention due to the problem of the inaccuracy of the improper extraction for leading to web page text of parameter setting
The effect of high user experience.
It, can be with it should be noted that the first language being related in the present embodiment can be Chinese, Korean, Japanese etc.
It is configured according to the needs of users.
In the optional embodiment of the present embodiment, for being obtained in training sample in the present embodiment step 202, there are the
The mode of the text attribute value of the webpage of one language, can be achieved in that in the present embodiment
Step S202-1: obtain there are the accounting of first language in the webpage of first language, the character amount of first language, with
And there are total character amounts of the webpage of first language;
Step S202-2: the mean value of accounting, the variance of accounting, the mean value of character amount, word are determined according to accounting and character amount
The variance of symbol amount;
Step S202-3: by the accounting of first language, the character amount of first language, there are the total of the webpage of first language
Character amount, the mean value of accounting, the variance of accounting, the mean value of character amount, the variance of character amount are as the first parameter value;
Step S202-4: the second parameter value is determined based on the first parameter value.
It is lower illustrate for above-mentioned steps S202-1 to step S202-4 below by taking first language is Chinese as an example
Bright, step S202-1 to step S202-4 may include: in the optional embodiment of the present embodiment
Step 1, to number webpage, to each webpage according to the structure extraction of html each there are Chinese contents
Label (being usually present in div tag), be put into label information list labellist=L (1), L (2) ... ..L
(i) ... .L (num) } in, wherein num be number of labels, L (i)={ L (i, j) } be label information j=1,2;L (i, 1) storage
Label substance, L (i, 2) storage whether based on text status word.
Step 2, Chinese accounting inta and existing Chinese character quantity (chinese Number) in each label are calculated;
Step 3, according to the coding of Chinese character, the Chinese character quantity CN (ki) in L (ki) is counted, in entire label
The text character quantity AN (ki) of appearance;
To calculate Chinese accounting inta (ki) value of L (ki), calculation formula are as follows:
Inta (ki)=CN (ki)/AN (ki)
Step 4, according to Chinese accounting inta (ki) value and Chinese character quantity CN (ki), text attribute value power is calculated
(ki);
Calculation are as follows: first inta and CN is normalized, specific formula is as follows:
Norinta (i)=(inta (i)-intamean)/stdinta
NorCN (i)=(CN (i)-CNmean)/st dCN
Power (ki)=Norinta (i) * NorCN (i)
Wherein: intamean indicates the mean value of inta, and stdinta indicates the variance of inta, and CNmean indicates the mean value of CN,
The variance of stdCN expression CN;
Step 5, vector={ intamean, stdinta, CNmean, stdCN, the AN for obtaining each label information
(ki), CN (ki), power (ki), L (ki, 2) } eight parameters.
Wherein, the mean value of all label Chinese accountings of intamean, the variance of all label Chinese accountings of stdinta,
The average value of all label Chinese character quantity of CNmean;The variance of all label Chinese character quantity of stdCN CNmean, CN
(ki) Chinese character quantity, the text character quantity AN (ki) of entire label substance, L (ki, 2) storage whether based on text shape
State word.
In another optional embodiment of the present embodiment, step step S202 can be generated to step by the following method
For determining the connection weight of perceptron neural network and the population of bias in rapid S210, the step of this method, includes:
Step 11, the upper bound LB of DIM optimal design parameter of the three layer perceptron neural network is setjAnd lower bound
UBj, wherein subscript j=1,2....D;
Step 12, the first population P that individual quantity is Popsize is randomly generatedt, wherein in first population
Each individual be stored with design to be optimized DIM parameter;
Wherein,Subscript i=1,2 ..., Popsize, andIt is described
PtIn i-th individual.
Optionally, the formula of random initializtion are as follows:Wherein, subscript j=
1,2 ..., D, rand (0,1) are to obey equally distributed random real number between [0,1] to generate function.
In another optional embodiment of the present embodiment, step is generated to adapting in the population by the following method
It is worth optimal individual to be decoded to obtain the connection weight and bias of the three layer perceptron neural network:
Step 20: maximum evaluation number MAX_FEs, the initial Evaluation: Current number FEs=0 of setting enables current evolution algebra t
=0, calculate the first population PtIn each individual adaptive value;
Step 21: Evaluation: Current number FEs=FEs+Popsize is enabled, to optimal in the adaptive value of each individual
Individual BesttIt saves;
Step 22: calculating current gravitational constant GtAnd the first population PtIn each individual quality, wherein
The current gravitational constant GtIt is determined by following formula:
Step 23: according to the current gravitational constant GtAnd the quality of each individual determines the first population Pt
In current elite individual amount KBestt, wherein the current elite individual amount KBesttIt is determined by following formula:
Step 24: updating the first population PtEach of individual acceleration, speed and position, obtain second
Population, and the adaptive value for the individual that each of calculates second population, enable Evaluation: Current number FEs=FEs+Popsize;
Step 25: intermediate chaos factor cf is generated in second population;
Step 26: a positive integer R1 is randomly generated between [1, Popsize] of second population, [1,
Popsize] between the positive integer R2 for being not equal to the R1 is randomly generated, and calculate the intermediate chaos factor cf;
Step 27: generating individual Ut, wherein it is as follows to generate formula:
Step 28: calculating the individual UtAdaptive value Fit (Ut), if the individual UtAdaptive value Fit (Ut) be better thanAdaptive valueStep 29 is then gone to, step 26 is otherwise gone to;
Step 29: enabling the current evolution algebra t=t+1, save the optimum individual Best in second populationt;?
Evaluation: Current number FEs is greater than the optimum individual Best that after MAX_FEs, will be obtainedtIt is decoded as the three layer perceptron nerve net
The connection weight and bias of network.
In addition, may is that the mode of chaos factor cf among determining in this present embodiment
Step 30: the initialization one intermediate chaos factor, and the update times of the chaos factor among the initialization are set
Num;
Step 31: setting initializes random real number of the intermediate chaos factor between the first value range, if described initial
Change the intermediate chaos factor and then regenerate the chaos factor among the initialization equal to the second preset value, until in the initialization
Between the chaos factor be not equal to the second preset value, wherein second preset value is among first value range;
Step 32: calculator ki=1 is enabled, if the calculator ki is greater than the Num, randomly chooses an individual, it is right
The individual carries out chaos local search, otherwise goes to step 33;
Step 33: initially changing the intermediate chaos factor to described and be updated, obtain the intermediate chaos factor cf, formula is such as
Under: cf=4 × cf × (1-cf);
Step 34, if calculator ki=ki+1, goes to step 32.
The method and step in the present embodiment is described in detail below with reference to specific embodiment, the specific embodiment party
Provide a kind of Chinese web page body of text extracting method of gravitation chess game optimization in formula, the step of this method includes:
Step S302, according to 7 parameters in label as input, a parameter carries out text training as output;It needs
It is noted that parameter is preferably 7 in the present embodiment, it also can according to need and other numbers be set.
Step S304 obtains neural network model according to text training result;
Step S306, new text determine affiliated text type according to trained neural network model.
It is comprised the following methods firstly, for the mode for carrying out text training in step S302:
Step S302-1, to number webpage, to each webpage according in each presence of the structure extraction of html
The label (being usually present in div tag) of literary content, be put into label information list labellist=L (1), L (2),
... ..L (i) ... .L (num) in, wherein num be number of labels, L (i)={ L (i, j) } be label information j=1,2;L(i,
1) label substance, text status word based on L (i, 2) storage whether are stored.
Step S302-2 calculates Chinese accounting inta and existing Chinese character quantity (chinese in each label
Number);
Step S302-3 counts the Chinese character quantity CN (ki) in L (ki) according to the coding of Chinese character, and entire
The text character quantity AN (ki) of label substance;
To calculate Chinese accounting inta (ki) value of L (ki), calculation formula are as follows:
Inta (ki)=CN (ki)/AN (ki);
Step S302-4 calculates text attribute value according to Chinese accounting inta (ki) value and Chinese character quantity CN (ki)
power(ki);Calculation are as follows: first inta and CN is normalized, specific formula is as follows:
Norinta (i)=(inta (i)-intamean)/stdinta
NorCN (i)=(CN (i)-CNmean)/st dCN
Power (ki)=Norinta (i) * NorCN (i);
Wherein: intamean indicates the mean value of inta, and stdinta indicates the variance of inta, and CNmean indicates the mean value of CN,
The variance of stdCN expression CN.
Step S302-5, obtain each label information vector=intamean, stdinta, CNmean, stdCN,
AN (ki), CN (ki), power (ki), L (ki, 2) } eight parameters.
The mean value of all label Chinese accountings of intamean, the variance of all label Chinese accountings of stdinta, CNmean institute
There is the average value of label Chinese character quantity;The variance of all label Chinese character quantity of stdCN CNmean, CN (ki) Chinese
Character quantity, the text character quantity AN (ki) of entire label substance, L (ki, 2) storage whether based on text status word.
The mode of training neural network model in step S304 is comprised the following methods:
Step S304-1 extracts training sample, the training dataset for being set as neural network for preceding 80%, wherein data volume
It is set as test data set for TraNum group data, rear 20%, wherein data volume is TestNum group data;
Step S304-2, user's initiation parameter, Population Size Popsize, maximum evaluation number MAX_FEs, perceptron
Backward learning factor OBL is arranged in the number HN of neural network hidden layer neuron;
Step S304-3 enables current evolution algebra t=0, Evaluation: Current number FEs=0;
Step S304-4, enable three layer perceptron neural network input variable be intamean, stdinta, CNmean,
StdCN, AN (ki), CN (ki), power (ki) }, it exports as L (i, 2) (body tag), then determines three layer perceptron nerve
The hidden layer of network and the transmission function of output layer, and calculate optimal design parameter number DIM=HN × 8+ of three layer perceptron
1;
The upper bound LB of DIM optimal design parameter of three layer perceptron is arranged in step S304-5jWith lower bound UBj, wherein j=
1,2....D;
Step S304-6, is randomly generated initial populationWherein subscript i=1,
2 ..., Popsize, andFor population PtIn i-th individual, random initializtion formula are as follows:
Wherein j=1,2 ..., D,Indicate the position of i-th of individual, DIM optimization for storing three layer perceptron is set
The value of parameter is counted,Indicate velocity magnitude of i-th of individual on every dimension, rand (0,1) is to obey between [0,1]
Equally distributed random real number generates function;
Step S304-7 calculates population PtIn each individual adaptive value;
Step S304-8 enables Evaluation: Current number FEs=FEs+Popsize;
Step S304-9 saves population PtIn optimum individual Bestt;
Step S304-10 calculates current gravitational constant G as followst:
Step S304-11 calculates the quality of each individual in population;
Step S304-12 calculates current elite individual amount KBest as followst:
Step S304-13, the acceleration of the individual in Population Regeneration:
Step S304-14, the speed of Population Regeneration individual and position;
Step S304-15 calculates the adaptive value of each individual of population;
Step S304-16 enables Evaluation: Current number FEs=FEs+Popsize;
Step S304-17 generates the intermediate chaos factor: the following steps are included:
Intermediate chaos factor update times Num is arranged in step S304-171;
Step S304-172 enables random real number of the intermediate chaos factor cf between [0,1], if cf is equal to 0.25,0.5
Or 0.75 regenerate cf, until cf is not equal to 0.25,0.5 or 0.75.
Step S304-173 enables calculator ki=1;
Step S304-174 goes to step S304-18, otherwise goes to step S304-175 if calculator ki is greater than Num.
Step S304-175 is updated intermediate chaos factor cf, and more new formula is as follows:
Cf=4 × cf × (1-cf)
Step S304-176, calculator ki=ki+1 go to step S304-174.
Step S304-18 randomly chooses an individual, carries out chaos local search to the individual and obtains individual Ut, specifically
It operates as follows:
A positive integer R1 is randomly generated in step S304-181 between [1, Popsize];
The positive integer R2 for being not equal to R1 is randomly generated in step S304-182 between [1, Popsize], calculates intermediate mixed
Ignorant factor cf;
Step S304-183 generates Ut, generate formula are as follows:
Wherein kj=1,2...D;
Step S304-184 calculates individual UtAdaptive value Fit (Ut);If UtAdaptive value Fit (Ut) be better than's
Adaptive valueStep S304-19 is then gone to, step S304-182 is otherwise gone to;
Step S304-19 enables current evolution algebra t=t+1;
Step S304-20 saves population PtIn optimum individual Bestt;
Step S304-21 repeats step S304-1 to step S304-21 until Evaluation: Current number FEs reaches MAX_FEs
After terminate, obtained optimum individual BesttIt is decoded as the connection weight and bias of three layer perceptron neural network.
Include according to the mode of the determining affiliated text type of trained neural network model for text new in step S306
Following manner:
Each label information vector=intamean, stdinta, CNmean, stdCN, AN (ki), CN (ki),
power(ki)};Input obtain L, so that it may judge the note whether based on, to extract web page body.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
A kind of processing unit of webpage is additionally provided in the present embodiment, and the device is for realizing above-described embodiment and preferably
Embodiment, the descriptions that have already been made will not be repeated.As used below, predetermined function may be implemented in term " module "
The combination of software and/or hardware.Although device described in following embodiment is preferably realized with software, hardware, or
The realization of the combination of person's software and hardware is also that may and be contemplated.
Fig. 3 is the structural block diagram of the processing unit of webpage according to an embodiment of the present invention, as shown in figure 3, the device includes:
First obtains module 402, for obtaining the text attribute value in training sample there are the webpage of first language, wherein text category
Property value includes: to be used to indicate in webpage the first parameter value corresponding with first language, whether be used to indicate webpage with the first language
Second parameter value of text based on speech;First determining module 404 is of coupled connections with the first acquisition module 402, is used for first
Parameter value perceptually device neural network input variable determination be used to indicate whether webpage is the text based on first language
Third parameter value;Second determining module 406 is of coupled connections with the first determining module 404, for according to the second parameter value and the
Three parameter values determine the adaptive value of population at individual in perceptron neural network;Decoder module 408, with 406 coupling of the second determining module
Connection is closed, for being decoded to obtain the connection weight of perceptron neural network and biasing to the optimal individual of adaptive value in population
Value;Third determining module 410 is of coupled connections with decoder module 408, for determining net to be processed based on connection weight and bias
Page whether based on first language text.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong
Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any
Combined form is located in different processors.
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein
The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps
Calculation machine program:
Step S1, by the first parameter value perceptually device neural network input variable with determination whether be used to indicate webpage
For the third parameter value of the text based on first language;
Step S2 determines the adaptation of population at individual in perceptron neural network according to the second parameter value and third parameter value
Value;
Step S3, the individual optimal to adaptive value in population be decoded to obtain perceptron neural network connection weight and
Bias;
Step S4, based on connection weight and bias determine webpage to be processed whether based on first language text.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read-
Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard
The various media that can store computer program such as disk, magnetic or disk.
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory
There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method
Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device
It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment
Example, details are not described herein for the present embodiment.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general
Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein
Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or
Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc.
With replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (9)
1. a kind of processing method of webpage characterized by comprising
There are the text attribute values of the webpage of first language in acquisition training sample, wherein the text attribute value includes: to be used for
It indicates the first parameter value corresponding with the first language in the webpage, whether be used to indicate the webpage with first language
Based on text the second parameter value;
First parameter value is determined to being used to indicate the webpage is as the input variable of three layer perceptron neural network
The no third parameter value for the text based on first language;
Population at individual in the three layer perceptron neural network is determined according to second parameter value and the third parameter value
Adaptive value;
The individual optimal to adaptive value in the population is decoded to obtain the connection weight of the three layer perceptron neural network
And bias;
Based on the connection weight and bias determine webpage to be processed whether based on first language text.
2. the method according to claim 1, wherein obtaining in training sample, there are the texts of the webpage of first language
This attribute value includes:
Obtain the accounting of first language described in webpage there are first language, the character amount of the first language and described
There are total character amounts of the webpage of first language;
According to the accounting and the character amount determine the mean value of the accounting, the variance of the accounting, the character amount it is equal
Value, the variance of the character amount;
By the accounting of the first language, the character amount of the first language, described there are total words of the webpage of first language
Described in Fu Liang, the mean value of the accounting, the variance of the accounting, the mean value of the character amount, the variance of the character amount are used as
First parameter value;
Second parameter value is determined based on first parameter value.
3. the method according to claim 1, wherein step is generated for determining three layers of perception by the following method
The connection weight of device neural network and the population of bias:
Step 11, the upper bound LB of DIM optimal design parameter of the three layer perceptron neural network is setjWith lower bound UBj,
Middle subscript j=1,2....D;
Step 12, the first population P that individual quantity is Popsize is randomly generatedt, wherein it is each in first population
Individual is all stored with DIM parameters of design to be optimized;
Wherein,Subscript i=1,2 ..., Popsize, andFor the PtIn
I-th individual.
4. according to the method described in claim 3, it is characterized in that, the formula of random initializtion are as follows:
Wherein, subscript j=1,2 ..., D, rand (0,1) are to obey equally distributed random real number between [0,1] to generate letter
Number.
5. according to the method described in claim 3, it is characterized in that, step is generated to adapting in the population by the following method
It is worth optimal individual to be decoded to obtain the connection weight and bias of the three layer perceptron neural network:
Step 20: setting maximum evaluation number MAX_FEs, initial Evaluation: Current number FEs=0 enable current evolution algebra t=0,
Calculate the first population PtIn each individual adaptive value;
Step 21: Evaluation: Current number FEs=FEs+Popsize is enabled, to the optimum individual in the adaptive value of each individual
BesttIt saves;
Step 22: calculating current gravitational constant GtAnd the first population PtIn each individual quality, wherein it is described
Current gravitational constant GtIt is determined by following formula:
Step 23: according to the current gravitational constant GtAnd the quality of each individual determines the first population PtIn
Current elite individual amount KBestt, wherein the current elite individual amount KBesttIt is determined by following formula:
Step 24: updating the first population PtEach of individual acceleration, speed and position, obtain the second population,
And the adaptive value for the individual that each of calculates second population, enable Evaluation: Current number FEs=FEs+Popsize;
Step 25: intermediate chaos factor cf is generated in second population;
Step 26: a positive integer R1 being randomly generated between [1, Popsize] of second population, between [1, Popsize]
It is randomly generated one and is not equal to the positive integer R2 of the R1, and calculate the intermediate chaos factor cf;
Step 27: generating individual Ut, wherein it is as follows to generate formula:
Step 28: calculating the individual UtAdaptive value Fit (Ut), if the individual UtAdaptive value Fit (Ut) be better than's
Adaptive valueStep 29 is then gone to, step 26 is otherwise gone to;
Step 29: enabling the current evolution algebra t=t+1, save the optimum individual Best in second populationt;It is commented currently
Valence number FEs is greater than the optimum individual Best that after MAX_FEs, will be obtainedtIt is decoded as the company of the three layer perceptron neural network
Connect weight and bias.
6. according to the method described in claim 5, it is characterized in that, generating intermediate chaos factor cf by following steps:
Step 30: the initialization one intermediate chaos factor, and the update times Num of the chaos factor among the initialization is set;
Step 31: setting initializes random real number of the intermediate chaos factor between the first value range, if in the initialization
Between the chaos factor be equal to the second preset value and then regenerate the chaos factor among the initialization, it is mixed among the initialization
The ignorant factor is not equal to the second preset value, wherein second preset value is among first value range;
Step 32: enabling calculator ki=1, if the calculator ki is greater than the Num, an individual is randomly choosed, to described
Individual carries out chaos local search, otherwise goes to step 33;
Step 33: the chaos factor among the initialization is updated, obtains the intermediate chaos factor cf, formula is as follows:
Cf=4 × cf × (1-cf);
Step 34, if calculator ki=ki+1, goes to step 32.
7. a kind of processing unit of webpage characterized by comprising
First obtains module, for obtaining the text attribute value in training sample there are the webpage of first language, wherein the text
This attribute value include: be used to indicate in the webpage the first parameter value corresponding with the first language, be used to indicate it is described
Webpage whether based on first language text the second parameter value;
First determining module, for by first parameter value perceptually device neural network input variable determination be used to indicate
The webpage whether be the text based on first language third parameter value;
Second determining module, for determining the perceptron neural network according to second parameter value and the third parameter value
The adaptive value of middle population at individual;
Decoder module, for being decoded to obtain the perceptron neural network to the optimal individual of adaptive value in the population
Connection weight and bias;
Third determining module, for determining webpage to be processed whether based on first language based on the connection weight and bias
Body text.
8. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer
Program is arranged to execute method described in any one of claim 1 to 6 when operation.
9. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory
Sequence, the processor are arranged to run the computer program to execute side described in any one of claim 1 to 6
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810725738.2A CN108984694B (en) | 2018-07-04 | 2018-07-04 | The processing method and processing device of webpage, storage medium, electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810725738.2A CN108984694B (en) | 2018-07-04 | 2018-07-04 | The processing method and processing device of webpage, storage medium, electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108984694A CN108984694A (en) | 2018-12-11 |
CN108984694B true CN108984694B (en) | 2019-07-30 |
Family
ID=64536124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810725738.2A Active CN108984694B (en) | 2018-07-04 | 2018-07-04 | The processing method and processing device of webpage, storage medium, electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108984694B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108984692B (en) * | 2018-07-04 | 2019-06-21 | 龙马智芯(珠海横琴)科技有限公司 | The processing method and processing device of webpage, storage medium, electronic device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874410A (en) * | 2017-01-22 | 2017-06-20 | 清华大学 | Chinese microblogging text mood sorting technique and its system based on convolutional neural networks |
CN108170660A (en) * | 2018-01-22 | 2018-06-15 | 北京百度网讯科技有限公司 | Display methods, device, browser, terminal and the medium of multilingual typesetting |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103870595A (en) * | 2014-04-01 | 2014-06-18 | 深圳市科盾科技有限公司 | Data mining system and method |
CN105302884B (en) * | 2015-10-19 | 2019-02-19 | 天津海量信息技术股份有限公司 | Webpage mode identification method and visual structure learning method based on deep learning |
US10672025B2 (en) * | 2016-03-08 | 2020-06-02 | Oath Inc. | System and method for traffic quality based pricing via deep neural language models |
CN106651030B (en) * | 2016-12-21 | 2020-08-04 | 重庆邮电大学 | Improved RBF neural network hot topic user participation behavior prediction method |
CN108021555A (en) * | 2017-11-21 | 2018-05-11 | 浪潮金融信息技术有限公司 | A kind of Question sentence parsing measure based on depth convolutional neural networks |
CN108984692B (en) * | 2018-07-04 | 2019-06-21 | 龙马智芯(珠海横琴)科技有限公司 | The processing method and processing device of webpage, storage medium, electronic device |
-
2018
- 2018-07-04 CN CN201810725738.2A patent/CN108984694B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874410A (en) * | 2017-01-22 | 2017-06-20 | 清华大学 | Chinese microblogging text mood sorting technique and its system based on convolutional neural networks |
CN108170660A (en) * | 2018-01-22 | 2018-06-15 | 北京百度网讯科技有限公司 | Display methods, device, browser, terminal and the medium of multilingual typesetting |
Also Published As
Publication number | Publication date |
---|---|
CN108984694A (en) | 2018-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103020845B (en) | A kind of method for pushing and system of mobile application | |
CN110532451A (en) | Search method and device for policy text, storage medium, electronic device | |
CN109635273A (en) | Text key word extracting method, device, equipment and storage medium | |
CN106339507B (en) | Streaming Media information push method and device | |
CN109471937A (en) | A kind of file classification method and terminal device based on machine learning | |
CN108363790A (en) | For the method, apparatus, equipment and storage medium to being assessed | |
CN109359175A (en) | Electronic device, the method for lawsuit data processing and storage medium | |
CN107656948A (en) | The problem of in automatically request-answering system clustering processing method and device | |
CN110413988A (en) | Method, apparatus, server and the storage medium of text information matching measurement | |
CN109284399A (en) | Similarity prediction model training method, equipment and computer readable storage medium | |
CN108304373A (en) | Construction method, device, storage medium and the electronic device of semantic dictionary | |
CN110263338A (en) | Replace entity name method, apparatus, storage medium and electronic device | |
CN107885785A (en) | Text emotion analysis method and device | |
CN110442842A (en) | The extracting method and device of treaty content, computer equipment, storage medium | |
CN110457596A (en) | A kind of resource recommendation processing method and processing device | |
CN108763452A (en) | Game application method for pushing, system and computer storage media based on big data | |
CN107861945A (en) | Finance data analysis method, application server and computer-readable recording medium | |
CN110392085A (en) | Webpage pre-download method and device, storage medium and electronic device | |
CN107229702A (en) | Micro- video popularity Forecasting Methodology with various visual angles Fusion Features is constrained based on low-rank | |
CN108304483A (en) | A kind of Web page classification method, device and equipment | |
CN106919588A (en) | A kind of application program search system and method | |
CN108984694B (en) | The processing method and processing device of webpage, storage medium, electronic device | |
CN108876409A (en) | Authentication method, system and relevant device are subsidized in a kind of colleges and universities' poverty | |
CN109960719A (en) | A kind of document handling method and relevant apparatus | |
CN108229640A (en) | The method, apparatus and robot of emotion expression service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder | ||
CP02 | Change in the address of a patent holder |
Address after: 519031 office 1316, No. 1, lianao Road, Hengqin new area, Zhuhai, Guangdong Patentee after: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd. Address before: 519000 room 417, building 20, creative Valley, Hengqin new area, Xiangzhou, Zhuhai, Guangdong Patentee before: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd. |