CN108984694B - The processing method and processing device of webpage, storage medium, electronic device - Google Patents

The processing method and processing device of webpage, storage medium, electronic device Download PDF

Info

Publication number
CN108984694B
CN108984694B CN201810725738.2A CN201810725738A CN108984694B CN 108984694 B CN108984694 B CN 108984694B CN 201810725738 A CN201810725738 A CN 201810725738A CN 108984694 B CN108984694 B CN 108984694B
Authority
CN
China
Prior art keywords
webpage
individual
value
language
population
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810725738.2A
Other languages
Chinese (zh)
Other versions
CN108984694A (en
Inventor
张峰
聂颖
郑权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dragon Horse Zhixin (zhuhai Hengqin) Technology Co Ltd
Original Assignee
Dragon Horse Zhixin (zhuhai Hengqin) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dragon Horse Zhixin (zhuhai Hengqin) Technology Co Ltd filed Critical Dragon Horse Zhixin (zhuhai Hengqin) Technology Co Ltd
Priority to CN201810725738.2A priority Critical patent/CN108984694B/en
Publication of CN108984694A publication Critical patent/CN108984694A/en
Application granted granted Critical
Publication of CN108984694B publication Critical patent/CN108984694B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The present invention provides a kind of processing method and processing device of webpage, storage medium, electronic devices, wherein this method comprises: obtaining in training sample, there are the text attribute values of the webpage of first language;By the first parameter value perceptually device neural network input variable with determine be used to indicate webpage whether be the text based on first language third parameter value;The adaptive value of population at individual in perceptron neural network is determined according to the second parameter value and third parameter value;The individual optimal to adaptive value in population is decoded to obtain the connection weight of perceptron neural network and bias;Based on connection weight and bias determine webpage to be processed whether based on first language text.Through the invention, it solves set in advance for extracting the characteristics of parameter of webpage is rule of thumb with structure of web page in the related technology, therefore can achieve the effect that improve user experience due to the problem of the inaccuracy of the improper extraction for leading to web page text of parameter setting.

Description

The processing method and processing device of webpage, storage medium, electronic device
Technical field
The present invention relates to the communications fields, in particular to a kind of processing method and processing device of webpage, storage medium, electricity Sub-device.
Background technique
In the scheme for the extraction webpage text content that the prior art provides, the content statistics for being all based on web page tag are inferred Web page contents whether based on content, the technology and the parameter manually set have it is very big be associated with, need rule of thumb to difference Webpage set different ginsengs.Webpage after loaded, the content in webpage is split, then by browsing in a browser Matching rule file in device positions web page contents, extracts required field contents and shows, thus user It can be seen that the webpage after text screening, allows users to convenient and absorbed reading.
Chinese web page text body content extraction be considered as a classification problem, i.e., to webpage text content whether based on Hold in vivo and classify, is extracted according to content based on classification results.But the prior art and the parameter manually set have very Big association needs rule of thumb to set different webpages different parameters, these methods are very high to the setting requirements of parameter, If parameter setting is improper, web page text extracts inaccuracy.
In view of the above problems in the related art, not yet there is effective solution at present.
Summary of the invention
The embodiment of the invention provides a kind of processing method and processing device of webpage, storage medium, electronic devices, at least to solve It is certainly set in advance for extracting the characteristics of parameter of webpage is rule of thumb with structure of web page in the related technology, therefore can be due to The problem of the inaccuracy of the improper extraction for leading to web page text of parameter setting.
According to one embodiment of present invention, a kind of processing method of webpage is provided, comprising: obtain and deposit in training sample In the text attribute value of the webpage of first language, wherein the text attribute value includes: to be used to indicate in the webpage and institute State corresponding first parameter value of first language, be used to indicate the webpage whether based on first language text the second parameter Value;By first parameter value perceptually device neural network input variable with determine be used to indicate the webpage whether be with The third parameter value of text based on first language;The perception is determined according to second parameter value and the third parameter value The adaptive value of population at individual in device neural network;The individual optimal to adaptive value in the population is decoded to obtain the perception The connection weight and bias of device neural network;Determine webpage to be processed whether with first based on the connection weight and bias Text based on language.
According to another embodiment of the invention, a kind of processing unit of webpage is provided, comprising: first obtains module, For obtaining the text attribute value in training sample there are the webpage of first language, wherein the text attribute value includes: to be used for It indicates the first parameter value corresponding with the first language in the webpage, whether be used to indicate the webpage with first language Based on text the second parameter value;First determining module, for by first parameter value perceptually device neural network Input variable determine be used to indicate the webpage whether be the text based on first language third parameter value;Second determines mould Block, for determining the suitable of population at individual in the perceptron neural network according to second parameter value and the third parameter value It should be worth;Decoder module, for being decoded to obtain the perceptron neural network to the optimal individual of adaptive value in the population Connection weight and bias;Third determining module, for determining that webpage to be processed is based on the connection weight and bias The no text based on first language.
According to still another embodiment of the invention, a kind of storage medium is additionally provided, meter is stored in the storage medium Calculation machine program, wherein the computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
According to still another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is described Computer program is stored in memory, the processor is arranged to run the computer program to execute any of the above-described Step in embodiment of the method.
Through the invention, by the text attribute value in acquisition training sample there are the webpage of first language, based on the determination The adaptive value of population at individual in perceptron neural network, and then determine the connection weight and bias of perceptron neural network, from And in pending web page text, which can be determined by the connection weight and bias of the perceptron neural network Page text whether based on first language text, it is seen then that for the determination of web page body text do not need according in advance setting Parameter determines, but determines the main text of webpage by the perceptron neural network of training, to solve related skill It is set in advance for extracting the characteristics of parameter of webpage is rule of thumb with structure of web page in art, therefore can be set due to parameter The problem of the inaccuracy for the improper extraction for leading to web page text set has achieved the effect that improve user experience.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the hardware block diagram of the terminal of the processing method of the webpage of the embodiment of the present invention;
Fig. 2 is the processing method flow chart of webpage according to an embodiment of the present invention;
Fig. 3 is the structural block diagram of the processing unit of webpage according to an embodiment of the present invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Embodiment 1
Embodiment of the method provided by the embodiment of the present application one can be in mobile terminal, terminal or similar fortune It calculates and is executed in device.For running on mobile terminals, Fig. 1 is a kind of end of the processing method of webpage of the embodiment of the present invention The hardware block diagram at end.As shown in Figure 1, mobile terminal 10 may include one or more (only showing one in Fig. 1) processing Device 102 (processing unit that processor 102 can include but is not limited to Micro-processor MCV or programmable logic device FPGA etc.) and Memory 104 for storing data, optionally, above-mentioned mobile terminal can also include the transmission device for communication function 106 and input-output equipment 108.It will appreciated by the skilled person that structure shown in FIG. 1 is only to illustrate, simultaneously The structure of above-mentioned mobile terminal is not caused to limit.For example, mobile terminal 10 may also include it is more than shown in Fig. 1 or less Component, or with the configuration different from shown in Fig. 1.
Memory 104 can be used for storing computer program, for example, the software program and module of application software, such as this hair The corresponding computer program of the processing method of webpage in bright embodiment, processor 102 are stored in memory 104 by operation Computer program realize above-mentioned method thereby executing various function application and data processing.Memory 104 can wrap Include high speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or Other non-volatile solid state memories.In some instances, memory 104 can further comprise long-range relative to processor 102 The memory of setting, these remote memories can pass through network connection to mobile terminal 10.The example of above-mentioned network include but It is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 106 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of mobile terminal 10 provide.In an example, transmitting device 106 includes a Network adaptation Device (Network Interface Controller, referred to as NIC), can be connected by base station with other network equipments to It can be communicated with internet.In an example, transmitting device 106 can for radio frequency (Radio Frequency, referred to as RF) module is used to wirelessly be communicated with internet.
A kind of processing method of webpage is provided in the present embodiment, and Fig. 2 is the place of webpage according to an embodiment of the present invention Method flow diagram is managed, as shown in Fig. 2, the process includes the following steps:
Step S202, obtain training sample in there are the text attribute values of the webpage of first language, wherein text attribute value Include: be used to indicate in webpage the first parameter value corresponding with first language, be used to indicate webpage whether with first language be Second parameter value of main text;
The input variable of first parameter value perceptually device neural network is used to indicate webpage and is by step S204 to determine The no third parameter value for the text based on first language;
Step S206 determines the adaptation of population at individual in perceptron neural network according to the second parameter value and third parameter value Value;
Step S208, the individual optimal to adaptive value in population are decoded to obtain the connection weight of perceptron neural network And bias;
Step S210, based on connection weight and bias determine webpage to be processed whether based on first language text.
S102 to step S110 through the above steps, obtain training sample in there are the text attributes of the webpage of first language Value based on the adaptive value of population at individual in the determination perceptron neural network, and then determines the connection weight of perceptron neural network Value and bias, so that the perceptron neural network can be passed through in pending web page text (new web page text) Connection weight and bias determine the new web page text whether based on first language text, it is seen then that for web page body The determination of text does not need to be determined according to prior setting parameter, but determines webpage by trained perceptron neural network Main text, so that solving the parameter in the related technology for extracting webpage is rule of thumb and the characteristics of structure of web page It is set in advance, therefore can reach and mention due to the problem of the inaccuracy of the improper extraction for leading to web page text of parameter setting The effect of high user experience.
It, can be with it should be noted that the first language being related in the present embodiment can be Chinese, Korean, Japanese etc. It is configured according to the needs of users.
In the optional embodiment of the present embodiment, for being obtained in training sample in the present embodiment step 202, there are the The mode of the text attribute value of the webpage of one language, can be achieved in that in the present embodiment
Step S202-1: obtain there are the accounting of first language in the webpage of first language, the character amount of first language, with And there are total character amounts of the webpage of first language;
Step S202-2: the mean value of accounting, the variance of accounting, the mean value of character amount, word are determined according to accounting and character amount The variance of symbol amount;
Step S202-3: by the accounting of first language, the character amount of first language, there are the total of the webpage of first language Character amount, the mean value of accounting, the variance of accounting, the mean value of character amount, the variance of character amount are as the first parameter value;
Step S202-4: the second parameter value is determined based on the first parameter value.
It is lower illustrate for above-mentioned steps S202-1 to step S202-4 below by taking first language is Chinese as an example Bright, step S202-1 to step S202-4 may include: in the optional embodiment of the present embodiment
Step 1, to number webpage, to each webpage according to the structure extraction of html each there are Chinese contents Label (being usually present in div tag), be put into label information list labellist=L (1), L (2) ... ..L (i) ... .L (num) } in, wherein num be number of labels, L (i)={ L (i, j) } be label information j=1,2;L (i, 1) storage Label substance, L (i, 2) storage whether based on text status word.
Step 2, Chinese accounting inta and existing Chinese character quantity (chinese Number) in each label are calculated;
Step 3, according to the coding of Chinese character, the Chinese character quantity CN (ki) in L (ki) is counted, in entire label The text character quantity AN (ki) of appearance;
To calculate Chinese accounting inta (ki) value of L (ki), calculation formula are as follows:
Inta (ki)=CN (ki)/AN (ki)
Step 4, according to Chinese accounting inta (ki) value and Chinese character quantity CN (ki), text attribute value power is calculated (ki);
Calculation are as follows: first inta and CN is normalized, specific formula is as follows:
Norinta (i)=(inta (i)-intamean)/stdinta
NorCN (i)=(CN (i)-CNmean)/st dCN
Power (ki)=Norinta (i) * NorCN (i)
Wherein: intamean indicates the mean value of inta, and stdinta indicates the variance of inta, and CNmean indicates the mean value of CN, The variance of stdCN expression CN;
Step 5, vector={ intamean, stdinta, CNmean, stdCN, the AN for obtaining each label information (ki), CN (ki), power (ki), L (ki, 2) } eight parameters.
Wherein, the mean value of all label Chinese accountings of intamean, the variance of all label Chinese accountings of stdinta, The average value of all label Chinese character quantity of CNmean;The variance of all label Chinese character quantity of stdCN CNmean, CN (ki) Chinese character quantity, the text character quantity AN (ki) of entire label substance, L (ki, 2) storage whether based on text shape State word.
In another optional embodiment of the present embodiment, step step S202 can be generated to step by the following method For determining the connection weight of perceptron neural network and the population of bias in rapid S210, the step of this method, includes:
Step 11, the upper bound LB of DIM optimal design parameter of the three layer perceptron neural network is setjAnd lower bound UBj, wherein subscript j=1,2....D;
Step 12, the first population P that individual quantity is Popsize is randomly generatedt, wherein in first population Each individual be stored with design to be optimized DIM parameter;
Wherein,Subscript i=1,2 ..., Popsize, andIt is described PtIn i-th individual.
Optionally, the formula of random initializtion are as follows:Wherein, subscript j= 1,2 ..., D, rand (0,1) are to obey equally distributed random real number between [0,1] to generate function.
In another optional embodiment of the present embodiment, step is generated to adapting in the population by the following method It is worth optimal individual to be decoded to obtain the connection weight and bias of the three layer perceptron neural network:
Step 20: maximum evaluation number MAX_FEs, the initial Evaluation: Current number FEs=0 of setting enables current evolution algebra t =0, calculate the first population PtIn each individual adaptive value;
Step 21: Evaluation: Current number FEs=FEs+Popsize is enabled, to optimal in the adaptive value of each individual Individual BesttIt saves;
Step 22: calculating current gravitational constant GtAnd the first population PtIn each individual quality, wherein The current gravitational constant GtIt is determined by following formula:
Step 23: according to the current gravitational constant GtAnd the quality of each individual determines the first population Pt In current elite individual amount KBestt, wherein the current elite individual amount KBesttIt is determined by following formula:
Step 24: updating the first population PtEach of individual acceleration, speed and position, obtain second Population, and the adaptive value for the individual that each of calculates second population, enable Evaluation: Current number FEs=FEs+Popsize;
Step 25: intermediate chaos factor cf is generated in second population;
Step 26: a positive integer R1 is randomly generated between [1, Popsize] of second population, [1, Popsize] between the positive integer R2 for being not equal to the R1 is randomly generated, and calculate the intermediate chaos factor cf;
Step 27: generating individual Ut, wherein it is as follows to generate formula:
Step 28: calculating the individual UtAdaptive value Fit (Ut), if the individual UtAdaptive value Fit (Ut) be better thanAdaptive valueStep 29 is then gone to, step 26 is otherwise gone to;
Step 29: enabling the current evolution algebra t=t+1, save the optimum individual Best in second populationt;? Evaluation: Current number FEs is greater than the optimum individual Best that after MAX_FEs, will be obtainedtIt is decoded as the three layer perceptron nerve net The connection weight and bias of network.
In addition, may is that the mode of chaos factor cf among determining in this present embodiment
Step 30: the initialization one intermediate chaos factor, and the update times of the chaos factor among the initialization are set Num;
Step 31: setting initializes random real number of the intermediate chaos factor between the first value range, if described initial Change the intermediate chaos factor and then regenerate the chaos factor among the initialization equal to the second preset value, until in the initialization Between the chaos factor be not equal to the second preset value, wherein second preset value is among first value range;
Step 32: calculator ki=1 is enabled, if the calculator ki is greater than the Num, randomly chooses an individual, it is right The individual carries out chaos local search, otherwise goes to step 33;
Step 33: initially changing the intermediate chaos factor to described and be updated, obtain the intermediate chaos factor cf, formula is such as Under: cf=4 × cf × (1-cf);
Step 34, if calculator ki=ki+1, goes to step 32.
The method and step in the present embodiment is described in detail below with reference to specific embodiment, the specific embodiment party Provide a kind of Chinese web page body of text extracting method of gravitation chess game optimization in formula, the step of this method includes:
Step S302, according to 7 parameters in label as input, a parameter carries out text training as output;It needs It is noted that parameter is preferably 7 in the present embodiment, it also can according to need and other numbers be set.
Step S304 obtains neural network model according to text training result;
Step S306, new text determine affiliated text type according to trained neural network model.
It is comprised the following methods firstly, for the mode for carrying out text training in step S302:
Step S302-1, to number webpage, to each webpage according in each presence of the structure extraction of html The label (being usually present in div tag) of literary content, be put into label information list labellist=L (1), L (2), ... ..L (i) ... .L (num) in, wherein num be number of labels, L (i)={ L (i, j) } be label information j=1,2;L(i, 1) label substance, text status word based on L (i, 2) storage whether are stored.
Step S302-2 calculates Chinese accounting inta and existing Chinese character quantity (chinese in each label Number);
Step S302-3 counts the Chinese character quantity CN (ki) in L (ki) according to the coding of Chinese character, and entire The text character quantity AN (ki) of label substance;
To calculate Chinese accounting inta (ki) value of L (ki), calculation formula are as follows:
Inta (ki)=CN (ki)/AN (ki);
Step S302-4 calculates text attribute value according to Chinese accounting inta (ki) value and Chinese character quantity CN (ki) power(ki);Calculation are as follows: first inta and CN is normalized, specific formula is as follows:
Norinta (i)=(inta (i)-intamean)/stdinta
NorCN (i)=(CN (i)-CNmean)/st dCN
Power (ki)=Norinta (i) * NorCN (i);
Wherein: intamean indicates the mean value of inta, and stdinta indicates the variance of inta, and CNmean indicates the mean value of CN, The variance of stdCN expression CN.
Step S302-5, obtain each label information vector=intamean, stdinta, CNmean, stdCN, AN (ki), CN (ki), power (ki), L (ki, 2) } eight parameters.
The mean value of all label Chinese accountings of intamean, the variance of all label Chinese accountings of stdinta, CNmean institute There is the average value of label Chinese character quantity;The variance of all label Chinese character quantity of stdCN CNmean, CN (ki) Chinese Character quantity, the text character quantity AN (ki) of entire label substance, L (ki, 2) storage whether based on text status word.
The mode of training neural network model in step S304 is comprised the following methods:
Step S304-1 extracts training sample, the training dataset for being set as neural network for preceding 80%, wherein data volume It is set as test data set for TraNum group data, rear 20%, wherein data volume is TestNum group data;
Step S304-2, user's initiation parameter, Population Size Popsize, maximum evaluation number MAX_FEs, perceptron Backward learning factor OBL is arranged in the number HN of neural network hidden layer neuron;
Step S304-3 enables current evolution algebra t=0, Evaluation: Current number FEs=0;
Step S304-4, enable three layer perceptron neural network input variable be intamean, stdinta, CNmean, StdCN, AN (ki), CN (ki), power (ki) }, it exports as L (i, 2) (body tag), then determines three layer perceptron nerve The hidden layer of network and the transmission function of output layer, and calculate optimal design parameter number DIM=HN × 8+ of three layer perceptron 1;
The upper bound LB of DIM optimal design parameter of three layer perceptron is arranged in step S304-5jWith lower bound UBj, wherein j= 1,2....D;
Step S304-6, is randomly generated initial populationWherein subscript i=1, 2 ..., Popsize, andFor population PtIn i-th individual, random initializtion formula are as follows:
Wherein j=1,2 ..., D,Indicate the position of i-th of individual, DIM optimization for storing three layer perceptron is set The value of parameter is counted,Indicate velocity magnitude of i-th of individual on every dimension, rand (0,1) is to obey between [0,1] Equally distributed random real number generates function;
Step S304-7 calculates population PtIn each individual adaptive value;
Step S304-8 enables Evaluation: Current number FEs=FEs+Popsize;
Step S304-9 saves population PtIn optimum individual Bestt
Step S304-10 calculates current gravitational constant G as followst:
Step S304-11 calculates the quality of each individual in population;
Step S304-12 calculates current elite individual amount KBest as followst:
Step S304-13, the acceleration of the individual in Population Regeneration:
Step S304-14, the speed of Population Regeneration individual and position;
Step S304-15 calculates the adaptive value of each individual of population;
Step S304-16 enables Evaluation: Current number FEs=FEs+Popsize;
Step S304-17 generates the intermediate chaos factor: the following steps are included:
Intermediate chaos factor update times Num is arranged in step S304-171;
Step S304-172 enables random real number of the intermediate chaos factor cf between [0,1], if cf is equal to 0.25,0.5 Or 0.75 regenerate cf, until cf is not equal to 0.25,0.5 or 0.75.
Step S304-173 enables calculator ki=1;
Step S304-174 goes to step S304-18, otherwise goes to step S304-175 if calculator ki is greater than Num.
Step S304-175 is updated intermediate chaos factor cf, and more new formula is as follows:
Cf=4 × cf × (1-cf)
Step S304-176, calculator ki=ki+1 go to step S304-174.
Step S304-18 randomly chooses an individual, carries out chaos local search to the individual and obtains individual Ut, specifically It operates as follows:
A positive integer R1 is randomly generated in step S304-181 between [1, Popsize];
The positive integer R2 for being not equal to R1 is randomly generated in step S304-182 between [1, Popsize], calculates intermediate mixed Ignorant factor cf;
Step S304-183 generates Ut, generate formula are as follows:
Wherein kj=1,2...D;
Step S304-184 calculates individual UtAdaptive value Fit (Ut);If UtAdaptive value Fit (Ut) be better than's Adaptive valueStep S304-19 is then gone to, step S304-182 is otherwise gone to;
Step S304-19 enables current evolution algebra t=t+1;
Step S304-20 saves population PtIn optimum individual Bestt
Step S304-21 repeats step S304-1 to step S304-21 until Evaluation: Current number FEs reaches MAX_FEs After terminate, obtained optimum individual BesttIt is decoded as the connection weight and bias of three layer perceptron neural network.
Include according to the mode of the determining affiliated text type of trained neural network model for text new in step S306 Following manner:
Each label information vector=intamean, stdinta, CNmean, stdCN, AN (ki), CN (ki), power(ki)};Input obtain L, so that it may judge the note whether based on, to extract web page body.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
A kind of processing unit of webpage is additionally provided in the present embodiment, and the device is for realizing above-described embodiment and preferably Embodiment, the descriptions that have already been made will not be repeated.As used below, predetermined function may be implemented in term " module " The combination of software and/or hardware.Although device described in following embodiment is preferably realized with software, hardware, or The realization of the combination of person's software and hardware is also that may and be contemplated.
Fig. 3 is the structural block diagram of the processing unit of webpage according to an embodiment of the present invention, as shown in figure 3, the device includes: First obtains module 402, for obtaining the text attribute value in training sample there are the webpage of first language, wherein text category Property value includes: to be used to indicate in webpage the first parameter value corresponding with first language, whether be used to indicate webpage with the first language Second parameter value of text based on speech;First determining module 404 is of coupled connections with the first acquisition module 402, is used for first Parameter value perceptually device neural network input variable determination be used to indicate whether webpage is the text based on first language Third parameter value;Second determining module 406 is of coupled connections with the first determining module 404, for according to the second parameter value and the Three parameter values determine the adaptive value of population at individual in perceptron neural network;Decoder module 408, with 406 coupling of the second determining module Connection is closed, for being decoded to obtain the connection weight of perceptron neural network and biasing to the optimal individual of adaptive value in population Value;Third determining module 410 is of coupled connections with decoder module 408, for determining net to be processed based on connection weight and bias Page whether based on first language text.
It should be noted that above-mentioned modules can be realized by software or hardware, for the latter, Ke Yitong Following manner realization is crossed, but not limited to this: above-mentioned module is respectively positioned in same processor;Alternatively, above-mentioned modules are with any Combined form is located in different processors.
The embodiments of the present invention also provide a kind of storage medium, computer program is stored in the storage medium, wherein The computer program is arranged to execute the step in any of the above-described embodiment of the method when operation.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store by executing based on following steps Calculation machine program:
Step S1, by the first parameter value perceptually device neural network input variable with determination whether be used to indicate webpage For the third parameter value of the text based on first language;
Step S2 determines the adaptation of population at individual in perceptron neural network according to the second parameter value and third parameter value Value;
Step S3, the individual optimal to adaptive value in population be decoded to obtain perceptron neural network connection weight and Bias;
Step S4, based on connection weight and bias determine webpage to be processed whether based on first language text.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read- Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard The various media that can store computer program such as disk, magnetic or disk.
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment Example, details are not described herein for the present embodiment.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc. With replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (9)

1. a kind of processing method of webpage characterized by comprising
There are the text attribute values of the webpage of first language in acquisition training sample, wherein the text attribute value includes: to be used for It indicates the first parameter value corresponding with the first language in the webpage, whether be used to indicate the webpage with first language Based on text the second parameter value;
First parameter value is determined to being used to indicate the webpage is as the input variable of three layer perceptron neural network The no third parameter value for the text based on first language;
Population at individual in the three layer perceptron neural network is determined according to second parameter value and the third parameter value Adaptive value;
The individual optimal to adaptive value in the population is decoded to obtain the connection weight of the three layer perceptron neural network And bias;
Based on the connection weight and bias determine webpage to be processed whether based on first language text.
2. the method according to claim 1, wherein obtaining in training sample, there are the texts of the webpage of first language This attribute value includes:
Obtain the accounting of first language described in webpage there are first language, the character amount of the first language and described There are total character amounts of the webpage of first language;
According to the accounting and the character amount determine the mean value of the accounting, the variance of the accounting, the character amount it is equal Value, the variance of the character amount;
By the accounting of the first language, the character amount of the first language, described there are total words of the webpage of first language Described in Fu Liang, the mean value of the accounting, the variance of the accounting, the mean value of the character amount, the variance of the character amount are used as First parameter value;
Second parameter value is determined based on first parameter value.
3. the method according to claim 1, wherein step is generated for determining three layers of perception by the following method The connection weight of device neural network and the population of bias:
Step 11, the upper bound LB of DIM optimal design parameter of the three layer perceptron neural network is setjWith lower bound UBj, Middle subscript j=1,2....D;
Step 12, the first population P that individual quantity is Popsize is randomly generatedt, wherein it is each in first population Individual is all stored with DIM parameters of design to be optimized;
Wherein,Subscript i=1,2 ..., Popsize, andFor the PtIn I-th individual.
4. according to the method described in claim 3, it is characterized in that, the formula of random initializtion are as follows:
Wherein, subscript j=1,2 ..., D, rand (0,1) are to obey equally distributed random real number between [0,1] to generate letter Number.
5. according to the method described in claim 3, it is characterized in that, step is generated to adapting in the population by the following method It is worth optimal individual to be decoded to obtain the connection weight and bias of the three layer perceptron neural network:
Step 20: setting maximum evaluation number MAX_FEs, initial Evaluation: Current number FEs=0 enable current evolution algebra t=0, Calculate the first population PtIn each individual adaptive value;
Step 21: Evaluation: Current number FEs=FEs+Popsize is enabled, to the optimum individual in the adaptive value of each individual BesttIt saves;
Step 22: calculating current gravitational constant GtAnd the first population PtIn each individual quality, wherein it is described Current gravitational constant GtIt is determined by following formula:
Step 23: according to the current gravitational constant GtAnd the quality of each individual determines the first population PtIn Current elite individual amount KBestt, wherein the current elite individual amount KBesttIt is determined by following formula:
Step 24: updating the first population PtEach of individual acceleration, speed and position, obtain the second population, And the adaptive value for the individual that each of calculates second population, enable Evaluation: Current number FEs=FEs+Popsize;
Step 25: intermediate chaos factor cf is generated in second population;
Step 26: a positive integer R1 being randomly generated between [1, Popsize] of second population, between [1, Popsize] It is randomly generated one and is not equal to the positive integer R2 of the R1, and calculate the intermediate chaos factor cf;
Step 27: generating individual Ut, wherein it is as follows to generate formula:
Step 28: calculating the individual UtAdaptive value Fit (Ut), if the individual UtAdaptive value Fit (Ut) be better than's Adaptive valueStep 29 is then gone to, step 26 is otherwise gone to;
Step 29: enabling the current evolution algebra t=t+1, save the optimum individual Best in second populationt;It is commented currently Valence number FEs is greater than the optimum individual Best that after MAX_FEs, will be obtainedtIt is decoded as the company of the three layer perceptron neural network Connect weight and bias.
6. according to the method described in claim 5, it is characterized in that, generating intermediate chaos factor cf by following steps:
Step 30: the initialization one intermediate chaos factor, and the update times Num of the chaos factor among the initialization is set;
Step 31: setting initializes random real number of the intermediate chaos factor between the first value range, if in the initialization Between the chaos factor be equal to the second preset value and then regenerate the chaos factor among the initialization, it is mixed among the initialization The ignorant factor is not equal to the second preset value, wherein second preset value is among first value range;
Step 32: enabling calculator ki=1, if the calculator ki is greater than the Num, an individual is randomly choosed, to described Individual carries out chaos local search, otherwise goes to step 33;
Step 33: the chaos factor among the initialization is updated, obtains the intermediate chaos factor cf, formula is as follows:
Cf=4 × cf × (1-cf);
Step 34, if calculator ki=ki+1, goes to step 32.
7. a kind of processing unit of webpage characterized by comprising
First obtains module, for obtaining the text attribute value in training sample there are the webpage of first language, wherein the text This attribute value include: be used to indicate in the webpage the first parameter value corresponding with the first language, be used to indicate it is described Webpage whether based on first language text the second parameter value;
First determining module, for by first parameter value perceptually device neural network input variable determination be used to indicate The webpage whether be the text based on first language third parameter value;
Second determining module, for determining the perceptron neural network according to second parameter value and the third parameter value The adaptive value of middle population at individual;
Decoder module, for being decoded to obtain the perceptron neural network to the optimal individual of adaptive value in the population Connection weight and bias;
Third determining module, for determining webpage to be processed whether based on first language based on the connection weight and bias Body text.
8. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to execute method described in any one of claim 1 to 6 when operation.
9. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to run the computer program to execute side described in any one of claim 1 to 6 Method.
CN201810725738.2A 2018-07-04 2018-07-04 The processing method and processing device of webpage, storage medium, electronic device Active CN108984694B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810725738.2A CN108984694B (en) 2018-07-04 2018-07-04 The processing method and processing device of webpage, storage medium, electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810725738.2A CN108984694B (en) 2018-07-04 2018-07-04 The processing method and processing device of webpage, storage medium, electronic device

Publications (2)

Publication Number Publication Date
CN108984694A CN108984694A (en) 2018-12-11
CN108984694B true CN108984694B (en) 2019-07-30

Family

ID=64536124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810725738.2A Active CN108984694B (en) 2018-07-04 2018-07-04 The processing method and processing device of webpage, storage medium, electronic device

Country Status (1)

Country Link
CN (1) CN108984694B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984692B (en) * 2018-07-04 2019-06-21 龙马智芯(珠海横琴)科技有限公司 The processing method and processing device of webpage, storage medium, electronic device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874410A (en) * 2017-01-22 2017-06-20 清华大学 Chinese microblogging text mood sorting technique and its system based on convolutional neural networks
CN108170660A (en) * 2018-01-22 2018-06-15 北京百度网讯科技有限公司 Display methods, device, browser, terminal and the medium of multilingual typesetting

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870595A (en) * 2014-04-01 2014-06-18 深圳市科盾科技有限公司 Data mining system and method
CN105302884B (en) * 2015-10-19 2019-02-19 天津海量信息技术股份有限公司 Webpage mode identification method and visual structure learning method based on deep learning
US10672025B2 (en) * 2016-03-08 2020-06-02 Oath Inc. System and method for traffic quality based pricing via deep neural language models
CN106651030B (en) * 2016-12-21 2020-08-04 重庆邮电大学 Improved RBF neural network hot topic user participation behavior prediction method
CN108021555A (en) * 2017-11-21 2018-05-11 浪潮金融信息技术有限公司 A kind of Question sentence parsing measure based on depth convolutional neural networks
CN108984692B (en) * 2018-07-04 2019-06-21 龙马智芯(珠海横琴)科技有限公司 The processing method and processing device of webpage, storage medium, electronic device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874410A (en) * 2017-01-22 2017-06-20 清华大学 Chinese microblogging text mood sorting technique and its system based on convolutional neural networks
CN108170660A (en) * 2018-01-22 2018-06-15 北京百度网讯科技有限公司 Display methods, device, browser, terminal and the medium of multilingual typesetting

Also Published As

Publication number Publication date
CN108984694A (en) 2018-12-11

Similar Documents

Publication Publication Date Title
CN105046515B (en) Method and device for sorting advertisements
CN106339507B (en) Streaming Media information push method and device
CN112632385A (en) Course recommendation method and device, computer equipment and medium
CN110399490A (en) A kind of barrage file classification method, device, equipment and storage medium
CN109359175A (en) Electronic device, the method for lawsuit data processing and storage medium
CN107656948A (en) The problem of in automatically request-answering system clustering processing method and device
CN110413988A (en) Method, apparatus, server and the storage medium of text information matching measurement
CN105306495B (en) user identification method and device
CN109284399A (en) Similarity prediction model training method, equipment and computer readable storage medium
CN107885785A (en) Text emotion analysis method and device
CN105095311B (en) The processing method of promotion message, apparatus and system
CN107818491A (en) Electronic installation, Products Show method and storage medium based on user's Internet data
CN108985048B (en) Simulator identification method and related device
CN110442842A (en) The extracting method and device of treaty content, computer equipment, storage medium
CN110309114A (en) Processing method, device, storage medium and the electronic device of media information
CN108763452A (en) Game application method for pushing, system and computer storage media based on big data
CN110119445A (en) The method and apparatus for generating feature vector and text classification being carried out based on feature vector
CN110392085A (en) Webpage pre-download method and device, storage medium and electronic device
CN107229702A (en) Micro- video popularity Forecasting Methodology with various visual angles Fusion Features is constrained based on low-rank
CN108304483A (en) A kind of Web page classification method, device and equipment
CN108305057A (en) Dispensing apparatus, method and the computer readable storage medium of electronics red packet
CN108984694B (en) The processing method and processing device of webpage, storage medium, electronic device
CN108876409A (en) Authentication method, system and relevant device are subsidized in a kind of colleges and universities' poverty
CN104951434B (en) The determination method and apparatus of brand mood
CN110223095A (en) Determine the method, apparatus, equipment and storage medium of item property

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 519031 office 1316, No. 1, lianao Road, Hengqin new area, Zhuhai, Guangdong

Patentee after: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd.

Address before: 519000 room 417, building 20, creative Valley, Hengqin new area, Xiangzhou, Zhuhai, Guangdong

Patentee before: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd.