CN108628817A - A kind of data processing method and device - Google Patents
A kind of data processing method and device Download PDFInfo
- Publication number
- CN108628817A CN108628817A CN201710153501.7A CN201710153501A CN108628817A CN 108628817 A CN108628817 A CN 108628817A CN 201710153501 A CN201710153501 A CN 201710153501A CN 108628817 A CN108628817 A CN 108628817A
- Authority
- CN
- China
- Prior art keywords
- data
- length
- acquisition system
- same type
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Abstract
An embodiment of the present invention provides a kind of data processing method and device, wherein method includes:Obtain the first pending data acquisition system;Obtain the length for the data that first data acquisition system includes;The comparison that the data that first data acquisition system includes are carried out with the length determines the second data set according to comparison result from first data acquisition system, and there are correspondences between the data that the second data set includes.The data there are correspondence can be accurately and rapidly extracted through the embodiment of the present invention.
Description
Technical field
The present invention relates to Internet technical field more particularly to a kind of data processing method and devices.
Background technology
In the practical application scene for being related to big data processing, it is often necessary to sort out the number of same type from mass data
According to, the hash wherein mixed is filtered out, for example, when carrying out the collection of translation source material, since translation source material is for carrying
For having correspondence between the data of translation contrast relationship namely translation source material between different language, if from webpage
It is mingled with hash in the source data of middle extraction, then can carrys out severe jamming to the collection belt of translation source material, lead to not standard
Really, the data there are correspondence are rapidly extracted from source data, to complete the collection of translation source material.As it can be seen that how
Accurately and rapidly extract urgent problem to be solved when having become the collection of translation source material there are the data of correspondence.
Invention content
An embodiment of the present invention provides a kind of data processing method and devices, can accurately and rapidly extract presence pair
The data that should be related to.
First aspect of the embodiment of the present invention provides a kind of data processing method, including:
Obtain the first pending data acquisition system.
Obtain the length for the data that first data acquisition system includes.
The comparison that the data that first data acquisition system includes are carried out with the length, according to comparison result from described first
Determine the second data set in data acquisition system, there are correspondences between the data that the second data set includes.
Second aspect of the embodiment of the present invention provides a kind of data processing equipment, including:
Acquisition module, for obtaining the first pending data acquisition system.
The acquisition module is additionally operable to obtain the length for the data that first data acquisition system includes.
Processing module, the data for including to first data acquisition system carry out the comparison of the length, according to comparing
As a result the second data set is determined from first data acquisition system, is existed between the data that the second data set includes
Correspondence.
The first pending data acquisition system can be obtained through the embodiment of the present invention and first data acquisition system includes
The length of data, the data for including to first data acquisition system carry out the comparison of length, and according to comparison result from first number
According to determining the second data set in set, there are correspondence between the data which includes, so as to
Accurately and rapidly extract the data there are correspondence.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
Obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of flow diagram of data processing method provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of web page contents provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of data processing equipment provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts
Embodiment shall fall within the protection scope of the present invention.
Terminal described in the embodiment of the present invention can specifically include:Desktop computer, smart mobile phone, is put down at laptop
Plate computer etc..
Referring to Fig. 1, being a kind of flow diagram of data processing method provided in an embodiment of the present invention.In the present embodiment
Described data processing method, includes the following steps:
101, terminal obtains the first pending data acquisition system.
In the specific implementation, terminal can by the first pending data acquisition system of web crawlers technical limit spacing, including:Terminal
Hypertext markup language (Hyper Text Markup Language, HTML) is sent to webpage web server to ask, and is received
The html data that web server returns, and using third party library (such as BeautifulSoup etc.) to html data into row format
Neutralizing analysis, to obtain the first pending data acquisition system.
Wherein, html data can specifically be to provide the web data of translation source material, and translation source material is for providing not
Can be the forms such as paragraph or short sentence with the translation contrast relationship between language.The first pending data set that terminal obtains
Data in conjunction have translation source material, it is also possible to some hashes are had, as shown in Fig. 2, the first pending data acquisition system
Including the hash in 1,2,3,4 totally four groups of translation source materials and rectangle frame.
102, the terminal obtains the length for the data that first data acquisition system includes.
103, the data that the first data acquisition system described in the terminal-pair includes carry out the comparison of the length, are tied according to comparing
Fruit determines the second data set from first data acquisition system, exists between the data that the second data set includes pair
It should be related to.
Wherein, the length of data can be the length for the character string that data packet includes.
In the specific implementation, be used to provide the translation contrast relationship between different language based on translation source material, if first
Two adjacent data are translation source materials in data acquisition system, belong to same type data, then their length should be differed not
Big, and translation contrast relationship is not present with other data in the hash in the first data acquisition system, is not belonging to same type data,
It is considered that the length of hash differs larger with the length of adjacent data, to which terminal can be to every in the first data acquisition system
Two adjacent data carry out length comparison, and the length that length and adjacent data are obtained according to comparison result meets preset condition
Target data, target data form the second data set, and there are correspondences between the data that the second data set includes, specifically
The translation contrast relationship that be between each two adjacent data be between different language, to which terminal can when collecting translation source material
Accurately and rapidly to extract the data there are correspondence, hash is effective filtered out.
Further, terminal can also export the data in the second data set to terminal user according to correspondence,
Handling result is obtained in time convenient for terminal user, and handling result is verified etc..
In some feasible embodiments, it is assumed that the first data acquisition system includes n data, and n is positive integer, then terminal-pair
First data acquisition system carries out the comparison of length between adjacent data, and the length for obtaining length and adjacent data meets preset condition
The realization method of target data can be:
Terminal since the 1st data (i.e. i=1) in the first data acquisition system, according to the length of i-th data and i-th+
The length of 1 data judges whether i-th of data and i+1 data are same type data, and i is positive integer, and i≤n-1,
If i-th of data and i+1 data are same type data (there is translation contrast relationship i.e. between them), by i-th
Data and i+1 data are determined as target data, and the length of the length and the i-th+3 data according to the i-th+2 data, sentence
Whether the i-th+2 data of breaking and the i-th+3 data are same type data, and determine target data according to judging result, until i
=n-1 finally traverses entire first data acquisition system, so that it is determined that going out in the first data acquisition system to exist all of translation contrast relationship
Target data, namely filtered out the hash in the first data acquisition system.
Further, if i-th of data and i+1 data are not that (there is no translations i.e. between them for same type data
Contrast relationship), then terminal determines that i-th of data is hash, not as target data, and continues according to i+1
The length of the length of data and the i-th+2 data judges whether i+1 data and the i-th+2 data are same type data, with
And target data is determined according to judging result, until i=n-1, entire first data acquisition system is finally traversed, so that it is determined that going out first
There are all target datas of translation contrast relationship in data acquisition system, namely has filtered out the useless number in the first data acquisition system
According to.
In some feasible embodiments, it is assumed that the length of i-th of data is Li, the length of i+1 data is
Li+1, then:
In the case where the length of i-th of data and i+1 data (i.e. two adjacent datas) is shorter, same type number
According to can be defined as follows:
LiAnd Li+1Respectively less than default first numerical value, and LiAnd Li+1The absolute value of difference is less than default second value.
In the case that i-th of data and i+1 data (i.e. two adjacent datas) length have it is at least one longer,
Same type data can be defined as follows:
LiAnd Li+1In at least one be greater than or equal to default first numerical value, and LiAnd Li+1The absolute value and L of differenceiWith
Li+1The ratio of middle higher value is less than default third value.
Wherein, 10 can be taken by presetting the first numerical value, and default second value can take 4, and default third value can take 0.22,
I.e. in the case of the length of two adjacent datas shorter (being less than 10), if their length difference is less than 4, then it is assumed that
They are same type data.In the length of two adjacent datas in the case of at least one longer (being greater than or equal to 10),
If the ratio of higher value is less than 0.22 in the length of the absolute value of the length difference of two adjacent datas and two adjacent datas,
Then think that they are same type data.Determination for presetting the first numerical value, default second value and default third value can be with
It completes through a large number of experiments, for example, for presetting third value, if default third value is too small, judges two phases
Whether adjacent data are that the results of same type data can be very accurate, but may result in should be included in the second data set
Data be rejected, and if default third value is too big, may result in some hashes and be mistaken for target data
And be included in the second data set, therefore, in order to ensure that the filter effect of hash and as much as possible getting target
Data can choose a desired value as default third value in a large amount of experimentation.
For example, as shown in Fig. 2, the first data acquisition system includes 9 data, in accordance with the order from top to bottom number be 1,
2、3、4、5、6、7、8、9.Wherein, the 1st data and the 2nd data correspond to translation source material 1, the 3rd data and the 4th data
Corresponding the 2, the 5th data of translation source material and the 6th data correspond to translation source material 3, and the 8th data and the 9th data correspond to
Translation source material 4, the 7th data are the hash in rectangle frame, then terminal according to above-mentioned same type data definition rule,
It can determine that the 1st data and the 2nd data, the 3rd data and the 4th data, the 5th data and the 6th data are similar
Type data, and the 7th data and the 8th data are unsatisfactory for the definition rule of above-mentioned same type data since length differs too big,
Then terminal determines that the 7th data are hash, and the 8th data and the 9th data are same type data, so that it is determined that go out
The second data set includes 1,2,3,4 this four groups of translation source materials, and realizes and filter hash (i.e. the 7th data)
Fall.
It is possible to further which obtained the second data set (i.e. translation source material) to be applied to the training before machine translation
In learning process, so as to provide accurate interpretative function in practical applications.
In some feasible embodiments, concrete implementation process can be described as follows:
Assuming that array list corresponds to the first data acquisition system, array newlist corresponds to the second data set, and list=['
Abcdefg', ' higklmno', ' occupy-place ', ' uproot midday standing grain day, soil under droplet standing grain, who knows surve on human's plate, Every single grain is the fruit of hard work ', ' white
Day, the Yellow River entered ocean current, and ascend another storey to see a thousand miles further .' near the mountain to the greatest extent, ' translate', ' happyness', ' I am exactly one
A troublesome character string that comes here, please weeds out me OK', ' modesty helps one to go forward ', ' Pride goes before, and shame comes after '];
Flag is initialized as 0;
Since i=0, following flows are executed repeatedly, and has often executed primary following flows and the value of i is then added 1, until i
Value increase to the element number that array list includes:
If i is equal to flag, the value of strlen is set as to the data length of list [i], this flow terminates;
If i is more than flag, judge whether the corresponding elements of strlen and list [i] they are same type data, if so,
The value of flag is then set as i+1, and list [i-1] and list [i] is stored in array newlist, if it is not, then by flag
Value be set as i, this flow terminates.
Further, the array newlist that can be exported after above-mentioned flow is finished.
Wherein, judge whether the corresponding elements of strlen and list [i] are the detailed processes of same type data and can describe
It is as follows:
It determines the maximum value in the two numbers of the data length of strlen and list [i], calculates strlen's and list [i]
The ratio of the absolute value of difference and maximum value between data length, if maximum value is less than default first numerical value (such as 10), and
The absolute value of difference is less than default second value (such as 4) between strlen and the data length of list [i], alternatively, if the ratio
Value is less than default third value (such as 0.22), it is determined that the corresponding elements of strlen and list [i] are same type data;Otherwise,
Then determine that the corresponding elements of strlen and list [i] are not same type data.
Wherein, the result of above-mentioned realization process be the 3rd data (' occupy-place ') in list and the 8th data (' I am exactly
One troublesome character string that comes here, please weeds out me OK') it is hash, from without being placed in newlist,
The result formats for exporting newlist can be as follows:
Abcdefg higklmno are same group
It uprooted midday standing grain day, soil under droplet standing grain, who knows surve on human's plate, and Every single grain is the fruit of hard work, and the daytime is most near the mountain, and the Yellow River enters ocean current, is intended to poor
A thousand li mesh, that attains a yet higher goal are same group
Translate happiness are same group
Modesty helps one to go forward Pride goes before, and shame comes after be same group
In the embodiment of the present invention, terminal can obtain the first pending data acquisition system and first data acquisition system includes
Data length, the data for including to first data acquisition system carry out the comparison of length, and according to comparison result from this first
The second data set is determined in data acquisition system, there are correspondence between the data which includes, so as to
Accurately and rapidly to extract the data there are correspondence when collecting translation source material, hash is effective filtered out.
Referring to Fig. 3, being a kind of structural schematic diagram of data processing equipment provided in an embodiment of the present invention.In the present embodiment
Described data processing equipment, including:
Acquisition module 301, for obtaining the first pending data acquisition system.
The acquisition module 301 is additionally operable to obtain the length for the data that first data acquisition system includes.
Processing module 302, the data for including to first data acquisition system carry out the comparison of the length, according to than
The second data set is determined from first data acquisition system, deposit between the data that the second data set includes compared with result
In correspondence.
In some feasible embodiments, the acquisition module 301 is specifically used for:
Web page contents are parsed, the first pending data acquisition system is extracted.
In some feasible embodiments, the processing module 302 is specifically used for:
The comparison of length between adjacent data is carried out to first data acquisition system, obtains the length of length and adjacent data
Meet the target data of preset condition, the target data forms the second data set.
In some feasible embodiments, first data acquisition system includes n data, and the n is positive integer, described
Processing module 302 includes:
Judging unit 3020, for since i=1, according to the length of the length and i+1 data of i-th of data, sentencing
Whether i-th of data and the i+1 data of breaking are same type data, and the i is positive integer, and i≤n-1.
Determination unit 3021, if judging that i-th of data and the i+1 data are for the judging unit
I-th of data and the i+1 data are then determined as target data by the same type data.
The judging unit 3020 is additionally operable to the length of the length and the i-th+3 data according to the i-th+2 data, judges
Whether the i-th+2 data and the i-th+3 data are the same type data, until i=n-1.
In some feasible embodiments, the judging unit 3020, if be additionally operable to judge i-th of data and
The i+1 data are not the same type data, then according to the length of the i+1 data and the i-th+2 data
Length judges whether the i+1 data and the i-th+2 data are the same type data, until i=n-1.
In some feasible embodiments, the length of i-th of data is Li, the length of the i+1 data
For Li+1, wherein:
I-th of data and the i+1 data are the same type data, including:
The LiWith the Li+1Respectively less than default first numerical value, and the LiWith the Li+1The absolute value of difference is less than pre-
If second value.
Alternatively,
The LiWith the Li+1In at least one be greater than or equal to default first numerical value, and the LiWith it is described
Li+1The absolute value of difference and the LiWith the Li+1The ratio of middle higher value is less than default third value.
It is understood that each function module of the data processing equipment of the present embodiment, the function of unit can be according to above-mentioned
Method specific implementation in embodiment of the method, specific implementation process are referred to the associated description of above method embodiment, this
Place repeats no more.
In the embodiment of the present invention, acquisition module 301 obtains the first pending data acquisition system and first data acquisition system
Including data length, the data that processing module 302 includes to first data acquisition system carry out the comparison of length, and according to than
Determine the second data set from first data acquisition system compared with result, between the data which includes exist pair
It should be related to, so as to accurately and rapidly extract the data there are correspondence when collecting translation source material, effectively filter
Except hash.
Referring to Fig. 4, being a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.Described in the present embodiment
Terminal, including:Processor 401, network interface 402 and memory 403.Wherein, processor 401, network interface 402 and memory
403 can be connected by bus or other modes, and the embodiment of the present invention by bus for being connected.
Wherein, processor 401 (or central processing unit (Central Processing Unit, CPU)) is the meter of terminal
Core and control core are calculated, the Various types of data of all kinds of instructions and processing terminal in terminal can be parsed, such as:CPU
It can be used for parsing user to the switching on and shutting down instruction transmitted by terminal, and control terminal carries out switching on and shutting down operation;For another example:CPU can
To transmit all kinds of interaction datas, etc. between terminal inner structure.Network interface 402 may include optionally the wired of standard
Interface, wireless interface (such as WI-FI, mobile communication interface) are used for transceiving data by the control of processor 401.Memory 403
(Memory) it is memory device in terminal, for storing program and data.It is understood that memory 403 herein was both
It may include the internal memory of terminal, naturally it is also possible to the extended menory supported including terminal.The offer of memory 403 is deposited
Space is stored up, which stores the operating system of terminal, it may include but be not limited to:A kind of Windows systems (operation system
System), android system, IOS systems etc., the present invention is to this and is not construed as limiting.
In embodiments of the present invention, processor 401 is executed such as by the executable program code in run memory 403
Lower operation:
Processor 401, for obtaining the first pending data acquisition system by network interface 402.
The processor 401 is additionally operable to obtain the length for the data that first data acquisition system includes.
The processor 401 is additionally operable to carry out the data that first data acquisition system includes the comparison of the length, root
Determine the second data set from first data acquisition system according to comparison result, the data that the second data set includes it
Between there are correspondences.
In some feasible embodiments, the processor 401 is specifically used for:
Web page contents are parsed, the first pending data acquisition system is extracted.
In some feasible embodiments, the processor 401 is specifically used for:
The comparison of length between adjacent data is carried out to first data acquisition system, obtains the length of length and adjacent data
Meet the target data of preset condition, the target data forms the second data set.
In some feasible embodiments, first data acquisition system includes n data, and the n is positive integer, described
Processor 401 is specifically used for:
Since i=1, according to the length of the length and i+1 data of i-th of data, judge i-th of data and
Whether the i+1 data are same type data, and the i is positive integer, and i≤n-1.
If i-th of data and the i+1 data are the same type data, will i-th of data with
The i+1 data are determined as target data, and the length of the length and the i-th+3 data according to the i-th+2 data, judge
Whether the i-th+2 data and the i-th+3 data are the same type data, until i=n-1.
In some feasible embodiments, the processor 401, if being additionally operable to i-th of data and the i+1
A data are not that the same type data judge then according to the length of the length and the i-th+2 data of the i+1 data
Whether the i+1 data and the i-th+2 data are the same type data, until i=n-1.
In some feasible embodiments, the length of i-th of data is Li, the length of the i+1 data
For Li+1, wherein:
I-th of data and the i+1 data are the same type data, including:
The LiWith the Li+1Respectively less than default first numerical value, and the LiWith the Li+1The absolute value of difference is less than pre-
If second value.
Alternatively,
The LiWith the Li+1In at least one be greater than or equal to default first numerical value, and the LiWith it is described
Li+1The absolute value of difference and the LiWith the Li+1The ratio of middle higher value is less than default third value.
In the specific implementation, processor 401 described in the embodiment of the present invention, network interface 402 and memory 403 can be held
It is real also to can perform the present invention for the realization method gone described in a kind of flow of data processing method provided in an embodiment of the present invention
The realization method described in a kind of data processing equipment of example offer is applied, details are not described herein.
In the embodiment of the present invention, processor 401 obtains the first pending data acquisition system by network interface 402 and should
The length for the data that first data acquisition system includes, the comparison to the data progress length that first data acquisition system includes, and according to
Comparison result determines the second data set from first data acquisition system, exists between the data which includes
Correspondence, so as to accurately and rapidly extract the data there are correspondence when collecting translation source material, effectively
Filter out hash.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer read/write memory medium
In, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
It is above disclosed to be only a preferred embodiment of the present invention, the power of the present invention cannot be limited with this certainly
Sharp range, those skilled in the art can understand all or part of the processes for realizing the above embodiment, and is weighed according to the present invention
Equivalent variations made by profit requirement, still belong to the scope covered by the invention.
Claims (12)
1. a kind of data processing method, which is characterized in that including:
Obtain the first pending data acquisition system;
Obtain the length for the data that first data acquisition system includes;
The comparison that the data that first data acquisition system includes are carried out with the length, according to comparison result from first data
Determine the second data set in set, there are correspondences between the data that the second data set includes.
2. according to the method described in claim 1, it is characterized in that, described obtain pending the first data acquisition system, including:
Web page contents are parsed, the first pending data acquisition system is extracted.
3. method according to claim 1 or 2, which is characterized in that the data for including to first data acquisition system
The comparison for carrying out the length determines the second data set according to comparison result from first data acquisition system, including:
The comparison of length between adjacent data is carried out to first data acquisition system, the length for obtaining length and adjacent data meets
The target data of preset condition, the target data form the second data set.
4. according to the method described in claim 3, it is characterized in that, first data acquisition system includes n data, the n is
Positive integer, the comparison that length between adjacent data is carried out to first data acquisition system, acquisition length and adjacent data
Length meets the target data of preset condition, including:
Since i=1, according to the length of the length and i+1 data of i-th of data, i-th of data and described are judged
Whether i+1 data are same type data, and the i is positive integer, and i≤n-1;
If i-th of data and the i+1 data are the same type data, by i-th of data and described
I+1 data are determined as target data, and the length of the length and the i-th+3 data according to the i-th+2 data, described in judgement
Whether the i-th+2 data and the i-th+3 data are the same type data, until i=n-1.
5. according to the method described in claim 4, it is characterized in that, the method further includes:
If i-th of data and the i+1 data are not the same type data, according to the i+1 data
Length and the i-th+2 data length, judge whether the i+1 data and the i-th+2 data are described similar
Type data, until i=n-1.
6. according to the method described in claim 4, it is characterized in that, the length of i-th of data is Li, the i+1 number
According to length be Li+1, wherein:
I-th of data and the i+1 data are the same type data, including:
The LiWith the Li+1Respectively less than default first numerical value, and the LiWith the Li+1The absolute value of difference is less than default the
Two numerical value;
Alternatively,
The LiWith the Li+1In at least one be greater than or equal to default first numerical value, and the LiWith the Li+1Difference
The absolute value of value and the LiWith the Li+1The ratio of middle higher value is less than default third value.
7. a kind of data processing equipment, which is characterized in that including:
Acquisition module, for obtaining the first pending data acquisition system;
The acquisition module is additionally operable to obtain the length for the data that first data acquisition system includes;
Processing module, the data for including to first data acquisition system carry out the comparison of the length, according to comparison result
The second data set is determined from first data acquisition system, is existed between the data that the second data set includes and is corresponded to
Relationship.
8. device according to claim 7, which is characterized in that the acquisition module is specifically used for:
Web page contents are parsed, the first pending data acquisition system is extracted.
9. device according to claim 7 or 8, which is characterized in that the processing module is specifically used for:
The comparison of length between adjacent data is carried out to first data acquisition system, the length for obtaining length and adjacent data meets
The target data of preset condition, the target data form the second data set.
10. device according to claim 9, which is characterized in that first data acquisition system includes n data, and the n is
Positive integer, the processing module include:
Judging unit, for since i=1, according to the length of the length and i+1 data of i-th of data, judging described
Whether i data and the i+1 data are same type data, and the i is positive integer, and i≤n-1;
Determination unit, if judging that i-th of data and the i+1 data are described similar for the judging unit
I-th of data and the i+1 data are then determined as target data by type data;
The judging unit is additionally operable to the length of the length and the i-th+3 data according to the i-th+2 data, judges described i-th+2
Whether a data and the i-th+3 data are the same type data, until i=n-1.
11. device according to claim 10, which is characterized in that
The judging unit, if it is the same type number to be additionally operable to judge i-th of data and the i+1 data not
According to then according to the length of the length and the i-th+2 data of the i+1 data, judging the i+1 data and described
Whether i+2 data are the same type data, until i=n-1.
12. device according to claim 10, which is characterized in that the length of i-th of data is Li, the i+1
The length of data is Li+1, wherein:
I-th of data and the i+1 data are the same type data, including:
The LiWith the Li+1Respectively less than default first numerical value, and the LiWith the Li+1The absolute value of difference is less than default the
Two numerical value;
Alternatively,
The LiWith the Li+1In at least one be greater than or equal to default first numerical value, and the LiWith the Li+1Difference
The absolute value of value and the LiWith the Li+1The ratio of middle higher value is less than default third value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710153501.7A CN108628817B (en) | 2017-03-15 | 2017-03-15 | Data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710153501.7A CN108628817B (en) | 2017-03-15 | 2017-03-15 | Data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108628817A true CN108628817A (en) | 2018-10-09 |
CN108628817B CN108628817B (en) | 2022-07-26 |
Family
ID=63686575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710153501.7A Active CN108628817B (en) | 2017-03-15 | 2017-03-15 | Data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108628817B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102810097A (en) * | 2011-06-02 | 2012-12-05 | 高德软件有限公司 | Method and device for extracting webpage text content |
CN104572946A (en) * | 2014-12-30 | 2015-04-29 | 小米科技有限责任公司 | Method and device for processing data of yellow pages |
CN104573097A (en) * | 2015-01-30 | 2015-04-29 | 湖南蚁坊软件有限公司 | Method for extracting webpage content |
WO2015165245A1 (en) * | 2014-04-30 | 2015-11-05 | 广州市动景计算机科技有限公司 | Webpage data processing method and device |
WO2015176689A1 (en) * | 2014-05-23 | 2015-11-26 | 华为技术有限公司 | Data processing method and device |
CN105447167A (en) * | 2015-12-04 | 2016-03-30 | 北京奇虎科技有限公司 | Processing method and apparatus for node cache data in distributed system |
CN106484730A (en) * | 2015-08-31 | 2017-03-08 | 北京国双科技有限公司 | Character string matching method and device |
-
2017
- 2017-03-15 CN CN201710153501.7A patent/CN108628817B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102810097A (en) * | 2011-06-02 | 2012-12-05 | 高德软件有限公司 | Method and device for extracting webpage text content |
WO2015165245A1 (en) * | 2014-04-30 | 2015-11-05 | 广州市动景计算机科技有限公司 | Webpage data processing method and device |
WO2015176689A1 (en) * | 2014-05-23 | 2015-11-26 | 华为技术有限公司 | Data processing method and device |
CN104572946A (en) * | 2014-12-30 | 2015-04-29 | 小米科技有限责任公司 | Method and device for processing data of yellow pages |
CN104573097A (en) * | 2015-01-30 | 2015-04-29 | 湖南蚁坊软件有限公司 | Method for extracting webpage content |
CN106484730A (en) * | 2015-08-31 | 2017-03-08 | 北京国双科技有限公司 | Character string matching method and device |
CN105447167A (en) * | 2015-12-04 | 2016-03-30 | 北京奇虎科技有限公司 | Processing method and apparatus for node cache data in distributed system |
Also Published As
Publication number | Publication date |
---|---|
CN108628817B (en) | 2022-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105447204B (en) | Network address recognition methods and device | |
EP2472428B1 (en) | Response determining device, response determining method, response determining program, recording medium and response determining system | |
CN108134784A (en) | web page classification method and device, storage medium and electronic equipment | |
CN109388943A (en) | A kind of method, apparatus and computer readable storage medium identifying XSS attack | |
CN107153716B (en) | Webpage content extraction method and device | |
CN104298780B (en) | A kind of pre-acquiring method and system of browsing device net page information | |
US20210064453A1 (en) | Automated application programming interface (api) specification construction | |
CN107341399A (en) | Assess the method and device of code file security | |
CN103473107B (en) | A kind of method that interactive interface based on mobile middleware dynamically updates | |
CN108763274A (en) | Recognition methods, device, electronic equipment and the storage medium of access request | |
CN107463879A (en) | Human bodys' response method based on deep learning | |
CN107291778A (en) | The collection method and device of data | |
CN109299448A (en) | Resume intelligence filling method, system, server and storage medium | |
WO2020082763A1 (en) | Decision trees-based method and apparatus for detecting phishing website, and computer device | |
CN104462242B (en) | Webpage capacity of returns statistical method and device | |
CN110083755A (en) | A kind of high emulation parsing web-page approach, device and electronic equipment | |
CN102054040A (en) | Knowledge information interaction service method and site and questioning and answering interaction platform | |
CN109657125A (en) | Data processing method, device, equipment and storage medium based on web crawlers | |
CN104572787B (en) | The recognition methods of pseudo- original website and device | |
CN107784107A (en) | Dark chain detection method and device based on flight behavior analysis | |
CN107682350A (en) | Active defense method, device and electronic equipment based on web portal security scoring | |
CN108234441A (en) | Determine method, apparatus, electronic equipment and the storage medium of forgery access request | |
CN110110179A (en) | House market heating power ground drawing generating method, device, equipment and storage medium | |
CN108628817A (en) | A kind of data processing method and device | |
CN110781497B (en) | Method for detecting web page link and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |