CN112579855A - Method and device for extracting feature codes of WeChat article - Google Patents

Method and device for extracting feature codes of WeChat article Download PDF

Info

Publication number
CN112579855A
CN112579855A CN201910941622.7A CN201910941622A CN112579855A CN 112579855 A CN112579855 A CN 112579855A CN 201910941622 A CN201910941622 A CN 201910941622A CN 112579855 A CN112579855 A CN 112579855A
Authority
CN
China
Prior art keywords
link
wechat
character
feature
feature code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910941622.7A
Other languages
Chinese (zh)
Inventor
赵可建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201910941622.7A priority Critical patent/CN112579855A/en
Publication of CN112579855A publication Critical patent/CN112579855A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations

Abstract

The invention discloses a method and a device for extracting feature codes of WeChat articles, relates to the technical field of data processing, optimizes the processing process of obtaining WeChat article feature code information, ensures the integrity of the obtained feature code information, and also improves the obtaining efficiency, and the main technical scheme of the invention is as follows: acquiring a browsing link of the WeChat seal; judging whether the browsing link is a permanent link or not; and if so, extracting the feature codes of the WeChat articles from the permanent links. The method is applied to extracting the WeChat seal feature code information from the permanent link.

Description

Method and device for extracting feature codes of WeChat article
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for extracting feature codes of WeChat seals.
Background
The WeChat public number is an application account number applied by a developer or a merchant on the WeChat public platform, and the omnibearing communication and interaction of characters, pictures, voice and videos with a specific group can be realized on the WeChat platform through the public number. The WeChat public number is added into a new media line, big data information collection is carried out on the content information issued by the WeChat public number, and analysis of interaction behaviors between developers or merchants and users is facilitated.
At present, when big data is collected, information such as article text, release time, author, feature codes and the like is mainly acquired through analyzing a WeChat seal source code. However, in the process of analyzing the WeChat seal source code, part of feature code information in the WeChat seal source code may not be analyzed, so that the obtained WeChat seal feature code information is incomplete, and other ways have to be found to perform secondary feature code information acquisition operation, which consumes excessive processing cost and finally affects efficiency of big data collection.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for extracting feature codes of WeChat articles, and mainly aims to optimize a processing procedure for acquiring feature code information of WeChat articles, ensure integrity of the acquired feature code information, and improve acquisition efficiency.
In order to achieve the above purpose, the present invention mainly provides the following technical solutions:
on one hand, the invention provides a method for extracting feature codes of WeChat articles, which comprises the following steps:
acquiring a browsing link of the WeChat seal;
judging whether the browsing link is a permanent link or not;
and if so, extracting the feature codes of the WeChat articles from the permanent links.
Optionally, the feature code is multiple, and the extracting the feature code of the WeChat seal from the permanent link includes:
determining corresponding extraction sequence rules of a plurality of feature codes in the permanent links;
acquiring element composition rules corresponding to each feature code;
constructing a regular expression for extracting the feature codes according to the extraction sequence rule and the element composition rule;
traversing the character string information contained in the permanent link, searching the character string matched with the regular expression, determining the matched character string as the feature code of the WeChat article, and extracting the feature code of the WeChat article.
Optionally, before the obtaining the browsing link of the WeChat article, the method further includes:
acquiring a historical browsing link of a WeChat article, wherein the historical browsing link is a permanent link;
analyzing the historical browsing link to obtain character string information contained in the historical browsing link; and/or the presence of a gas in the gas,
searching character strings respectively containing preset elements in character string information contained in the historical browsing link to obtain a plurality of feature codes;
determining the arrangement sequence of a plurality of feature codes in the character string information;
constructing an extraction sequence rule corresponding to the feature codes according to the arrangement sequence;
and/or the presence of a gas in the gas,
respectively determining preset elements contained in each feature code;
and generating element composition rules corresponding to each feature code according to the preset elements.
Optionally, the feature code is multiple, and the extracting the feature code of the WeChat seal from the permanent link includes:
determining the arrangement sequence of a plurality of feature codes;
determining a first character of the feature code with the arrangement sequence at the first bit and a special character after the feature code with the arrangement sequence at the last bit according to the arrangement sequence;
searching the first character and the special character in the permanent link, and respectively verifying whether a character located at a first preset position behind the first character is a first preset character and verifying whether a character located at a second preset position before the special character is a second preset character after the first character and the special character are searched;
if the verification is successful, determining the character string between the first character and the special character as a field contained by a plurality of feature codes;
and extracting fields contained in the plurality of feature codes, and matching the fields with preset elements contained in each feature code to obtain each feature code.
Optionally, before the determining whether the browsing link is a permanent link, the method further includes:
acquiring character string information contained in a browsing link of the WeChat article;
and deleting redundant character string information from the character string information.
Optionally, the feature code at least includes: the message sending system comprises a first feature code used for identifying a WeChat public account, a second feature code used for identifying the number of a pushed message, a third feature code used for identifying the position of a WeChat article in the message, and a fourth feature code used for identifying a random encryption string corresponding to a WeChat article.
Optionally, if the browsing link is not a permanent link, the method further includes:
converting the browsing link into a permanent link according to a preset rule; and/or the presence of a gas in the gas,
retrieving a persistent link that enables access to the WeChat article.
On the other hand, the invention also provides a device for acquiring the signature code information of the WeChat seal, which comprises the following components:
the acquisition unit is used for acquiring the browsing link of the WeChat seal;
a judging unit, configured to judge whether the browsing link acquired by the acquiring unit is a permanent link;
and the extracting unit is used for extracting the feature codes of the WeChat articles from the permanent links when the judging unit judges that the browsing links are the permanent links.
Optionally, the feature codes are multiple, and the extracting unit includes:
the determining module is used for determining corresponding extraction sequence rules of the feature codes in the permanent links;
the acquisition module is used for acquiring element composition rules respectively corresponding to each feature code;
the construction module is used for constructing a regular expression for extracting the feature codes according to the extraction sequence rule determined by the determination module and the element composition rule obtained by the acquisition module;
the searching module is used for traversing the character string information contained in the permanent link and searching the character string matched with the regular expression;
and the extraction module is used for determining the matched character strings as the feature codes of the WeChat articles and extracting the feature codes of the WeChat articles.
Optionally, the method further includes:
the acquisition unit is further used for acquiring a historical browsing link of the WeChat article before the browsing link of the WeChat article is acquired, wherein the historical browsing link is a permanent link;
the analysis unit is used for analyzing the historical browsing link to obtain character string information contained in the historical browsing link; the searching unit is used for searching character strings respectively containing preset elements in the character string information contained in the historical browsing link to obtain a plurality of feature codes;
a determination unit configured to determine an arrangement order of the plurality of feature codes in the character string information;
the construction unit is used for constructing an extraction sequence rule corresponding to the feature codes according to the arrangement sequence;
the determining unit is further configured to determine preset elements included in each feature code respectively;
and the generating unit is used for generating element composition rules corresponding to each feature code according to the preset elements.
Optionally, the feature codes are multiple, and the extracting unit further includes:
the determining module is configured to determine an arrangement order of the feature codes;
the determining module is further configured to determine, according to the arrangement order, a first character of the feature code whose arrangement order is the first bit and a last character of the feature code whose arrangement order is the last bit;
the verification module is used for searching the first character and the last character in the permanent link, and respectively verifying whether a character located at a first preset position behind the first character is a first preset character and whether a character located at a second preset position before the last character is a second preset character after the first character and the last character are searched;
the determining module is further used for determining a character string between the first character and the last character as fields contained by a plurality of feature codes if the verification is successful;
the extraction module is further configured to extract fields included in the feature codes, and match the fields with preset elements included in each feature code to obtain each feature code.
Optionally, the apparatus further comprises:
the acquiring unit is further configured to acquire character string information included in a browsing link of the WeChat article before the judging unit judges whether the browsing link is a permanent link;
and the deleting unit is used for deleting the redundant character string information from the character string information.
Optionally, the feature code at least includes: the message sending system comprises a first feature code used for identifying a WeChat public account, a second feature code used for identifying the number of a pushed message, a third feature code used for identifying the position of a WeChat article in the message, and a fourth feature code used for identifying a random encryption string corresponding to a WeChat article.
Optionally, if the browsing link is not a permanent link, the apparatus further includes:
the conversion unit is used for converting the browsing link into a permanent link according to a preset rule; and/or the presence of a gas in the gas,
the acquiring unit is further used for re-acquiring the permanent link which can access the WeChat article.
In still another aspect, the present invention further provides a storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus where the storage medium is located is controlled to execute the feature code extraction method of the WeChat article as described above.
In yet another aspect, the present invention also provides an electronic device comprising at least one processor, and at least one memory, a bus connected to the processor;
the processor and the memory complete mutual communication through the bus;
the processor is used for calling the program instructions in the memory to execute the feature code extraction method of the WeChat article.
By the technical scheme, the technical scheme provided by the invention at least has the following advantages:
the invention provides a method and a device for extracting feature codes of WeChat articles. Compared with the prior art, the method and the device solve the problems that the efficiency is caused by incomplete feature code information obtained from source code analysis and the need of supplement operation acquisition, can execute one-time extraction operation, obtain complete feature code information from a permanent link, optimize the processing process of acquiring the WeChat seal feature code information, ensure the integrity of the acquired feature code information and improve the acquisition efficiency.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flowchart of a feature code extraction method for a WeChat article according to an embodiment of the present invention;
fig. 2 is a flowchart of another method for extracting feature codes of a wechat article according to an embodiment of the present invention;
fig. 3 is a block diagram of a feature code extraction apparatus for a WeChat article according to an embodiment of the present invention;
fig. 4 is a block diagram illustrating a feature code extraction apparatus of another WeChat article according to an embodiment of the present invention;
fig. 5 is an electronic device for feature code extraction of a wechat article according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The embodiment of the invention provides a method for extracting a feature code of a WeChat article, which is to directly extract a WeChat article feature code and a code value corresponding to the feature code from a permanent link as shown in figure 1, and the embodiment of the invention provides the following specific steps:
101. and acquiring the browsing link of the WeChat seal.
In the embodiment of the invention, a developer or a merchant applies for a public number on a WeChat public platform, and a WeChat article can be published through the WeChat public number, wherein the WeChat article specifically comprises characters, pictures, voice, video and the like. In addition, the instruction of opening the WeChat seal by using the browser can be triggered, and the WeChat article is opened by using the browser, so that the browsing link corresponding to the WeChat seal is obtained at the same time.
For the embodiment of the invention, one optimization operation is to pre-write a control encapsulated with a simulated click operation, so that the control is started to automatically open the WeChat article by using a browser and acquire a browsing link, or to execute a browsing link operation for crawling the WeChat article by using crawler software.
The browsing links of the acquired wechat articles include links beginning with "http://" and "https://". The method avoids the missing collection of browsing links due to different http protocols.
102. And judging whether the browsing link of the WeChat article is a permanent link or not.
The browsing link of the WeChat article can be a temporary link or a permanent link. For the temporary link, the WeChat background can preset browsing timeliness and browsing upper limit times, so that the temporary link is invalid and a user cannot access the temporary link any more when a certain time is exceeded or the browsing amount of the WeChat article reaches a certain limit.
The permanent link is a link with a static and absolute address, the WeChat background cannot preset access time limit or browsing amount upper limit on the permanent link of the WeChat article, and the WeChat article is allowed to be accessed all the time by using the permanent link unless the WeChat background deletes the WeChat article.
For the embodiment of the present invention, the feature code information included in the temporary link of the WeChat article is incomplete, such as: typically only contains the first signature (i.e., Biz) for identifying the WeChat public Account, but not Mid, Idx, Sn. However, the feature code information contained in the permanent link is complete, so that after the browsing link of the WeChat article is acquired, whether the link is a permanent link needs to be judged in advance, so that the operation of extracting the feature code information is performed subsequently.
103. And if the browsing link of the WeChat article is judged to be a permanent link, extracting the feature codes of the WeChat article from the permanent link.
In the embodiment of the invention, the WeChat signature code at least comprises: a characteristic code (namely: Biz) for identifying the public account of the WeChat, a characteristic code (namely: Mid) for identifying the number of the pushed message, a characteristic code (namely: Idx) for identifying the position of the WeChat article in the message, and a characteristic code (namely: Sn) for identifying a random encryption string corresponding to the WeChat chapter.
For example: WeChat seals are permanently linked as follows:
https://mp.weixin.qq.com/s?__biz=MjM5MzAxNjkwMA==&mid=2650476042&idx=2&sn=c1cf094f228249bdafbade6c1b0b1a92&chksm=be92d32189e55a3716a44ab6e70fa06671b9ce6a6e451202d21f06d439d04b1b4790f9cec5c4&mpsha re=1&scene=1&srcid=&sharer_sharetime=1568949234670&sharer_shareid=4d909ce64954591c250aa10079487374&key=f0b3b85af89c30e9005cc4c7e62db3a187796970348c3f03b560499254bd584a3d5e4245d9faba0855dc57def84983e8708cf74cfbe7ad7b1cd4fac44ceda96561160bd08380c43545a0a5017e1da1f9&asc ene=1&uin=MjkwNDQ1Njc0MA%3D%3D&devicetype=Windows+7&version=62060841&lang=zh_CN&pass_ticket=IKPy9HeslOQgWOflq4rbHiWSCtajyie ZDJgjrZPjVDQlvYkpE07dEiCSZeu%2BGZ5%2F
the embodiment of the invention provides a method for extracting a feature code of a WeChat article. Compared with the prior art, the method and the device for acquiring the characteristic code information solve the problems that the characteristic code information obtained from the source code analysis is incomplete and the efficiency is caused by the fact that the characteristic code information needs to be acquired through supplement operation.
In order to explain the above embodiments in more detail, another method for extracting feature codes of a wechat article is further provided in the embodiments of the present invention, as shown in fig. 2, the method is to pre-construct an extraction sequence rule corresponding to a plurality of feature codes and an element composition rule corresponding to each feature code, so as to generate a regular expression for extracting feature code information, so as to implement an operation of automatically extracting feature code information from a permanent link, and for this, the embodiments of the present invention provide the following specific steps:
201. and constructing an extraction sequence rule corresponding to the feature code.
Wherein, the WeChat seal feature code is at least: a characteristic code (namely: Biz) for identifying the public account of the WeChat, a characteristic code (namely: Mid) for identifying the number of the pushed message, a characteristic code (namely: Idx) for identifying the position of the WeChat article in the message, and a characteristic code (namely: Sn) for identifying a random encryption string corresponding to the WeChat chapter.
In the embodiment of the present invention, the detailed statement about the step may specifically include the following:
firstly, obtaining a historical browsing link of the WeChat article, wherein the historical browsing link is a permanent link, and analyzing the historical browsing link to obtain character string information contained in the historical browsing link.
For the embodiment of the invention, the association existing among a plurality of feature codes is searched by analyzing the historical browsing links. In order to optimize the processing of the historical browsing link, after the historical browsing link is acquired, redundant character string information in the historical browsing link can be preferentially deleted, for example: for the permanent link illustrated in step 103, deleting the redundant string information results in the following browsing link:
https://mp.weixin.qq.com/s?__biz=MjM5MzAxNjkwMA==&mid=2650476042&idx=2&sn=c1cf094f228249bdafbade6c1b0b1a92
secondly, searching character strings respectively containing preset elements in the character string information contained in the historical browsing link to obtain a plurality of feature codes, and determining the arrangement sequence of the feature codes in the character string information.
The preset elements comprise index character elements, letter elements and character elements. In the embodiment of the invention, the feature codes corresponding to different preset elements are searched in the character string information contained in the history browsing link according to the difference of the preset elements contained in different feature codes.
For example: biz is composed of upper and lower case letters, numbers, and equal numbers, Mid is composed of all numbers, Idx is a digit, Sn is composed of a number and a lower case letter.
With the embodiment of the present invention, since the search operation is performed in the order from the first character string to the last character string included in the character string information, after the search operation is performed, the arrangement order of the feature codes in the history browsing link is obtained according to the order in which the feature codes are searched.
For example: the simplified browsing links are as follows:
https://mp.weixin.qq.com/s?__biz={Biz}&mid={Mid}&idx={Idx}&sn= {Sn}
thereby obtaining a ranking order of the plurality of feature codes: "Biz, Mid, Idx, Sn".
And finally, constructing an extraction sequence rule corresponding to the feature codes according to the arrangement sequence. The rule of the extraction order is that in the character string information contained in any one WeChat chapter browsing link, the position ordering among a plurality of feature codes is 'Biz, Mid, Idx and Sn', correspondingly, if the operation of extracting the feature codes is executed on any one permanent link, the extraction order of the feature codes is 'Biz, Mid, Idx and Sn'
202. And generating element composition rules corresponding to each feature code.
It should be noted that, in this step, the component included in the feature code is also obtained by analyzing the history browsing link. In the embodiment of the present invention, the detailed statement about the step may specifically include the following:
firstly, respectively determining preset elements contained in each feature code, and generating element composition rules respectively corresponding to each feature code according to the preset elements.
For example: biz is composed of three preset elements including upper and lower case letters, numbers and equal numbers, Mid is composed of all number preset elements, Idx is one-digit number preset element, and Sn is composed of number and lower case letter preset element.
And secondly, generating element composition rules corresponding to each feature code according to preset elements.
It should be noted that, in the embodiment of the present invention, for the permanent link of the WeChat document acquired in real time, the extraction sequence rule and the element composition rule are combined, so that the positions of different feature codes in the permanent link can be analyzed and determined, and the feature code information can be further intercepted on the basis of the position.
203. And acquiring the browsing link of the WeChat seal.
In the embodiment of the present invention, for the statement of this step, refer to step 101, and will not be described herein again.
It should be noted that, in order to optimize the processing of the browsing link, after the browsing link is acquired, the redundant character string information in the browsing link may be preferentially deleted, for example: for the permanent link illustrated in step 103, deleting the redundant string information results in the following browsing link:
https://mp.weixin.qq.com/s?__biz=MjM5MzAxNjkwMA==&mid=2650476042& idx=2&sn=c1cf094f228249bdafbade6c1b0b1a92&
204. and judging whether the browsing link of the WeChat article is a permanent link or not.
In the embodiment of the present invention, specifically, the method for determining whether the link is a permanent link may include, but is not limited to, determining whether the feature codes "Biz, Mid, Idx, and Sn" coexist in the character string information included in the link of the WeChat document. If not, the browsing link is converted into a permanent link according to a preset rule, and the specific preset rule can be edited in advance according to requirements or completed by means of third-party conversion software; still alternatively, a permanent link is retrieved that enables access to the WeChat article.
205. And if the browsing link of the WeChat article is judged to be the permanent link, extracting the feature code of the WeChat article from the permanent link.
In the embodiment of the present invention, the detailed statement of the step specifically includes the following steps:
firstly, determining the corresponding extraction sequence rule of a plurality of feature codes in a permanent link, and acquiring the element composition rule corresponding to each feature code. For a specific implementation method for constructing the extraction sequence rule and generating the element composition rule, please refer to steps 201 and 202, which are not described herein again.
Secondly, according to the extraction sequence rule and the element composition rule, a regular expression for extracting the feature codes is constructed, the character string information contained in the permanent links is traversed, the character strings matched with the regular expression are searched, the matched character strings are determined as the feature codes of the WeChat articles, and the feature codes of the WeChat articles are extracted.
For example: obtaining a plurality of feature code sequences 'Biz, Mid, Idx, Sn' according to an extraction sequence rule, and obtaining according to an element composition rule: biz is composed of upper and lower case letters, numbers, and equal numbers, Mid is composed of all numbers, Idx is a digit, Sn is composed of a number and a lower case letter.
So a regular expression is constructed:
__biz=([\w\=]+)&mid=(\d+)&idx=(\d+)&sn=(\w+)
according to the regular expression, the searching and matching operation of the feature codes can be executed by utilizing the programming language, so that code values respectively corresponding to the feature codes are obtained.
Furthermore, another specific implementation method is also provided for extracting the feature codes of the WeChat articles from the permanent links:
firstly, determining the arrangement order of a plurality of feature codes, such as: the arrangement sequence is as follows: "Biz, Mid, Idx, Sn".
Next, according to the arrangement order, the first character of the feature code whose arrangement order is the first bit and the special character whose arrangement order is the last bit after the feature code are determined.
For example: the permanent links of WeChat articles are as follows
https://mp.weixin.qq.com/s?__biz=MjM5MzAxNjkwMA==&mid=2650476042&idx=2&sn=c1cf094f228249bdafbade6c1b0b1a92&chksm=be92d32189e55a3716a44ab6e70fa06671b9ce6a6e451202d21f06d439d04b1b4790f9cec5c4&mpsha re=1&scene=1&srcid=&sharer_sharetime=1568949234670&sharer_shareid=4d909ce64954591c250aa10079487374&key=f0b3b85af89c30e9005cc4c7e62db3a187796970348c3f03b560499254bd584a3d5e4245d9faba0855dc57def84983e8708cf74cfbe7ad7b1cd4fac44ceda96561160bd08380c43545a0a5017e1da1f9&asc ene=1&uin=MjkwNDQ1Njc0MA%3D%3D&devicetype=Windows+7&version=62060841&lang=zh_CN&pass_ticket=IKPy9HeslOQgWOflq4rbHiWSCtajyie ZDJgjrZPjVDQlvYkpE07dEiCSZeu%2BGZ5%2F
Analyzing this permanent link to see: the first character "B" of which the feature code is "Biz" in the first order and the special character "k" in the last order after the feature code "Sn" in the last order are "&", so that the first character and the last character are searched in the permanent link, and when the information containing the following character strings is found, the following are:
biz=MjM5MzAxNjkwMA==&mid=2650476042&idx=2&sn=c1cf094f228249b dafbade6c1b0b1a92&
secondly, respectively verifying whether a character located at a first preset position behind the first character is a first preset character and verifying whether a character located at a second preset position before the special character is a second preset character;
the first preset position or the second preset position may be determined according to a character number, for example: the position of a certain character in the character string information can be determined by obtaining the character number according to the sorting sequence of the certain character in the character string information.
In an embodiment of the present invention, this verification operation is: in the information of the character string included between the first character and the special character found in the permanent link, it is verified whether the character at the designated position matches with the preset character, specifically, the designated position may be: a certain position is selected after the first character of the first-order feature code is found, or a certain position before the special character is selected after the special character after the last-order feature code is found.
For example: in the following character string information, whether the character verification at the 18 th position after the first bit "biz" "b" is selected is "═ or" and whether the character verification at the 32 th position before the special character "&" after the last bit feature code "sn" is selected is "═ or" is selected.
biz=MjM5MzAxNjkwMA==&mid=2650476042&idx=2&sn=c1cf094f228249bdafbade6c1b0b1a92&
And finally, if the verification is successful, determining the character string between the first character and the special character as fields contained in a plurality of feature codes, extracting the fields contained in the plurality of feature codes, and matching the fields with preset elements contained in each feature code to obtain each feature code.
Wherein, the element compositions contained in different feature codes are different, for example: biz is composed of three preset elements including upper and lower case letters, numbers and equal numbers, Mid is composed of all number preset elements, Idx is one-digit number preset element, and Sn is composed of number and lower case letter preset element.
In the embodiment of the invention, according to different preset element compositions, in the character string information included between the first character of the searched first-bit feature code and the special character with the arrangement sequence after the last-bit feature code, the field included in each feature code is searched in a matching manner, so that a plurality of feature codes are obtained.
Further, as an implementation of the methods shown in fig. 1 and fig. 2, an embodiment of the present invention provides a device for extracting feature codes of a wechat article. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. The device is applied to extracting feature code information of WeChat articles from permanent links, and particularly as shown in FIG. 3, the device comprises:
an obtaining unit 301, configured to obtain a browsing link of a WeChat seal;
a judging unit 302, configured to judge whether the browsing link acquired by the acquiring unit 301 is a permanent link;
an extracting unit 303, configured to extract a feature code of the WeChat article from the permanent link when the determining unit 302 determines that the browsing link is the permanent link.
Further, as shown in fig. 4, the feature codes are multiple, and the extracting unit 303 includes:
a determining module 3031, configured to determine corresponding extraction order rules of the feature codes in the permanent link;
an obtaining module 3032, configured to obtain an element composition rule corresponding to each feature code;
a constructing module 3033, configured to construct a regular expression for extracting the feature codes according to the extraction sequence rule determined by the determining module 331 and the element composition rule obtained by the obtaining module 3032;
a searching module 3034, configured to traverse the character string information included in the permanent link, and search for a character string matching the regular expression;
the extracting module 3035 is configured to determine the matched character string as the feature code of the wechat article, and extract the feature code of the wechat article.
Further, as shown in fig. 4, the method further includes:
the obtaining unit 301 is further configured to obtain a historical browsing link of the WeChat document seal before obtaining the browsing link of the WeChat document, where the historical browsing link is a permanent link;
an analyzing unit 304, configured to analyze the historical browsing link to obtain character string information included in the historical browsing link;
a searching unit 305, configured to search, in the character string information included in the history browsing link, character strings respectively including preset elements to obtain a plurality of feature codes;
a determining unit 306, configured to determine an arrangement order of the feature codes in the character string information;
a constructing unit 307, configured to construct, according to the arrangement order, an extraction order rule corresponding to the feature code;
the determining unit 306 is further configured to determine preset elements included in each of the feature codes respectively;
the generating unit 308 is configured to generate an element composition rule corresponding to each feature code according to the preset element.
Further, as shown in fig. 4, the feature codes are multiple, and the extracting unit 303 further includes:
the determining module 3031 is configured to determine an arrangement order of the plurality of feature codes;
the determining module 3031 is further configured to determine, according to the arrangement order, a first character of the feature code whose arrangement order is the first bit and a last character of the feature code whose arrangement order is the last bit;
a verification module 3036, configured to search the first character and the last character in the permanent link, and after the first character and the last character are searched, respectively verify whether a character located at a first preset position after the first character is a first preset character, and verify whether a character located at a second preset position before the last character is a second preset character;
the determining module 3031 is further configured to determine, if the verification is successful, a character string between the first character and the special character as a field included in a plurality of feature codes;
the extracting module 3035 is further configured to extract fields included in the plurality of feature codes, and match the fields with preset elements included in each feature code to obtain each feature code.
Further, as shown in fig. 4, the apparatus further includes:
the obtaining unit 301 is further configured to obtain, before the determining whether the browsing link is a permanent link, character string information included in the browsing link of the WeChat article;
a deleting unit 309 configured to delete redundant character string information from the character string information.
Further, as shown in fig. 4, the feature code at least includes: the message sending system comprises a first feature code used for identifying a WeChat public account, a second feature code used for identifying the number of a pushed message, a third feature code used for identifying the position of a WeChat article in the message, and a fourth feature code used for identifying a random encryption string corresponding to a WeChat article.
Further, as shown in fig. 4, if the browsing link is not a permanent link, the apparatus further includes:
a converting unit 310, configured to convert the browsing link into a permanent link according to a preset rule; and/or the presence of a gas in the gas,
the obtaining unit 301 is further configured to obtain a persistent link that can access the WeChat article again.
In summary, embodiments of the present invention provide a method and an apparatus for extracting feature codes of a wechat article, where after determining that a browsing link of a wechat chapter is a permanent link, a wechat chapter feature code and a code value corresponding to the feature code are directly extracted from the permanent link. Compared with the prior art, the method and the device for acquiring the characteristic code information solve the problems that the characteristic code information obtained from the source code analysis is incomplete and the efficiency is caused by the fact that the characteristic code information needs to be acquired through supplement operation. In addition, in the embodiment of the present invention, an extraction sequence rule corresponding to a plurality of feature codes and an element composition rule corresponding to each feature code are pre-constructed, so as to generate a regular expression for extracting feature code information, thereby implementing an operation of automatically extracting feature code information from a permanent link.
The characteristic code extracting device of the WeChat article comprises a processor and a memory, wherein the acquiring unit, the judging unit, the extracting unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, the processing process of acquiring the WeChat seal feature code information is optimized by adjusting the kernel parameters, the integrity of the acquired feature code information is ensured, and the acquisition efficiency is also improved.
The embodiment of the invention provides a storage medium, wherein a program is stored on the storage medium, and the program realizes the feature code extraction method of the WeChat seal when being executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein the characteristic code extraction method of the WeChat article is executed when the program runs.
An embodiment of the present invention provides an electronic device 40, as shown in fig. 5, the device includes at least one processor 401, and at least one memory 402 and a bus 403 connected to the processor 401; the processor 401 and the memory 402 complete communication with each other through the bus 403; the processor 401 is configured to call program instructions in the memory 402 to execute the above-mentioned feature code extraction method of the WeChat article.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:
a method for extracting feature codes of WeChat articles comprises the following steps: acquiring a browsing link of the WeChat seal; judging whether the browsing link is a permanent link or not; and if so, extracting the feature codes of the WeChat articles from the permanent links.
Further, the feature code is multiple, and the extracting the feature code of the WeChat seal from the permanent link includes: determining corresponding extraction sequence rules of a plurality of feature codes in the permanent links; acquiring element composition rules corresponding to each feature code; constructing a regular expression for extracting the feature codes according to the extraction sequence rule and the element composition rule; traversing the character string information contained in the permanent link, searching the character string matched with the regular expression, determining the matched character string as the feature code of the WeChat article, and extracting the feature code of the WeChat article.
Further, before the obtaining the browsing link of the WeChat article, the method further includes: acquiring a historical browsing link of a WeChat article, wherein the historical browsing link is a permanent link; analyzing the historical browsing link to obtain character string information contained in the historical browsing link; and/or searching character strings respectively containing preset elements in the character string information contained in the historical browsing link to obtain a plurality of feature codes; determining the arrangement sequence of a plurality of feature codes in the character string information; constructing an extraction sequence rule corresponding to the feature codes according to the arrangement sequence; and/or respectively determining preset elements contained in each feature code; and generating element composition rules corresponding to each feature code according to the preset elements.
Further, the feature code is multiple, and the extracting the feature code of the WeChat seal from the permanent link includes: determining the arrangement sequence of a plurality of feature codes; determining a first character of the feature code with the arrangement sequence at the first bit and a special character after the feature code with the arrangement sequence at the last bit according to the arrangement sequence; searching the first character and the special character in the permanent link, and respectively verifying whether a character located at a first preset position behind the first character is a first preset character and verifying whether a character located at a second preset position before the special character is a second preset character after the first character and the special character are searched; if the verification is successful, determining the character string between the first character and the special character as a field contained by a plurality of feature codes; and extracting fields contained in the plurality of feature codes, and matching the fields with preset elements contained in each feature code to obtain each feature code.
Further, before the determining whether the browsing link is a permanent link, the method further includes: acquiring character string information contained in a browsing link of the WeChat article; and deleting redundant character string information from the character string information.
Further, the feature code at least includes: the message sending system comprises a first feature code used for identifying a WeChat public account, a second feature code used for identifying the number of a pushed message, a third feature code used for identifying the position of a WeChat article in the message, and a fourth feature code used for identifying a random encryption string corresponding to a WeChat article.
Further, if the browsing link is not a permanent link, the method further comprises: converting the browsing link into a permanent link according to a preset rule; and/or retrieving a persistent link that enables access to the WeChat article.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method for extracting feature codes of WeChat articles is characterized by comprising the following steps:
acquiring a browsing link of the WeChat seal;
judging whether the browsing link is a permanent link or not;
and if so, extracting the feature codes of the WeChat articles from the permanent links.
2. The method of claim 1, wherein the signature is a plurality of signatures, and the extracting the signature of the WeChat from the permanent link comprises:
determining corresponding extraction sequence rules of a plurality of feature codes in the permanent links;
acquiring element composition rules corresponding to each feature code;
constructing a regular expression for extracting the feature codes according to the extraction sequence rule and the element composition rule;
traversing the character string information contained in the permanent link, searching the character string matched with the regular expression, determining the matched character string as the feature code of the WeChat article, and extracting the feature code of the WeChat article.
3. The method of claim 2, wherein prior to the obtaining the browsing link of the WeChat article, the method further comprises:
acquiring a historical browsing link of a WeChat article, wherein the historical browsing link is a permanent link;
analyzing the historical browsing link to obtain character string information contained in the historical browsing link; and/or the presence of a gas in the gas,
searching character strings respectively containing preset elements in character string information contained in the historical browsing link to obtain a plurality of feature codes;
determining the arrangement sequence of a plurality of feature codes in the character string information;
constructing an extraction sequence rule corresponding to the feature codes according to the arrangement sequence;
and/or the presence of a gas in the gas,
respectively determining preset elements contained in each feature code;
and generating element composition rules corresponding to each feature code according to the preset elements.
4. The method of claim 1, wherein the signature is a plurality of signatures, and the extracting the signature of the WeChat from the permanent link comprises:
determining the arrangement sequence of a plurality of feature codes;
determining a first character of the feature code with the arrangement sequence at the first bit and a special character after the feature code with the arrangement sequence at the last bit according to the arrangement sequence;
searching the first character and the special character in the permanent link, and respectively verifying whether a character located at a first preset position behind the first character is a first preset character and verifying whether a character located at a second preset position before the special character is a second preset character after the first character and the special character are searched;
if the verification is successful, determining the character string between the first character and the special character as a field contained by a plurality of feature codes;
and extracting fields contained in the plurality of feature codes, and matching the fields with preset elements contained in each feature code to obtain each feature code.
5. The method of claim 1, wherein prior to said determining whether said browsing link is a persistent link, said method further comprises:
acquiring character string information contained in a browsing link of the WeChat article;
and deleting redundant character string information from the character string information.
6. The method according to any of claims 1-5, wherein the feature code comprises at least: the message sending system comprises a first feature code used for identifying a WeChat public account, a second feature code used for identifying the number of a pushed message, a third feature code used for identifying the position of a WeChat article in the message, and a fourth feature code used for identifying a random encryption string corresponding to a WeChat article.
7. The method of claim 1, wherein if the browsing link is not a permanent link, the method further comprises:
converting the browsing link into a permanent link according to a preset rule; and/or the presence of a gas in the gas,
retrieving a persistent link that enables access to the WeChat article.
8. An apparatus for acquiring signature information of WeChat seal, the apparatus comprising:
the acquisition unit is used for acquiring the browsing link of the WeChat seal;
a judging unit, configured to judge whether the browsing link acquired by the acquiring unit is a permanent link;
and the extracting unit is used for extracting the feature codes of the WeChat articles from the permanent links when the judging unit judges that the browsing links are the permanent links.
9. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to execute the feature code extraction method of the WeChat article according to any one of claims 1-7.
10. An electronic device, comprising at least one processor, and at least one memory, bus connected to the processor;
the processor and the memory complete mutual communication through the bus;
the processor is configured to call program instructions in the memory to perform the feature code extraction method of the WeChat article according to any one of claims 1-7.
CN201910941622.7A 2019-09-30 2019-09-30 Method and device for extracting feature codes of WeChat article Pending CN112579855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910941622.7A CN112579855A (en) 2019-09-30 2019-09-30 Method and device for extracting feature codes of WeChat article

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910941622.7A CN112579855A (en) 2019-09-30 2019-09-30 Method and device for extracting feature codes of WeChat article

Publications (1)

Publication Number Publication Date
CN112579855A true CN112579855A (en) 2021-03-30

Family

ID=75116280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910941622.7A Pending CN112579855A (en) 2019-09-30 2019-09-30 Method and device for extracting feature codes of WeChat article

Country Status (1)

Country Link
CN (1) CN112579855A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020107973A1 (en) * 2000-11-13 2002-08-08 Lennon Alison Joan Metadata processes for multimedia database access
US20070257938A1 (en) * 2006-05-04 2007-11-08 William Steinbock Element template system
CN101360088A (en) * 2007-07-30 2009-02-04 华为技术有限公司 Regular expression compiling, matching system and compiling, matching method
CN101719124A (en) * 2008-10-09 2010-06-02 李晶心 System of infinite layering multi-path acquisition based on regular matching
US20110066982A1 (en) * 2009-09-15 2011-03-17 Prabakar Paulsami Hierarchical Model for Web Browser Navigation
CA2772747A1 (en) * 2011-03-31 2012-09-30 Accenture Global Services Limited Form layout method and system
CN109033203A (en) * 2018-06-29 2018-12-18 大连交通大学 A kind of feature extraction method for parallel processing towards big data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020107973A1 (en) * 2000-11-13 2002-08-08 Lennon Alison Joan Metadata processes for multimedia database access
US20070257938A1 (en) * 2006-05-04 2007-11-08 William Steinbock Element template system
CN101360088A (en) * 2007-07-30 2009-02-04 华为技术有限公司 Regular expression compiling, matching system and compiling, matching method
CN101719124A (en) * 2008-10-09 2010-06-02 李晶心 System of infinite layering multi-path acquisition based on regular matching
US20110066982A1 (en) * 2009-09-15 2011-03-17 Prabakar Paulsami Hierarchical Model for Web Browser Navigation
CA2772747A1 (en) * 2011-03-31 2012-09-30 Accenture Global Services Limited Form layout method and system
CN109033203A (en) * 2018-06-29 2018-12-18 大连交通大学 A kind of feature extraction method for parallel processing towards big data

Similar Documents

Publication Publication Date Title
CN106919555B (en) System and method for field extraction of data contained within a log stream
CN107341399B (en) Method and device for evaluating security of code file
CN109964216A (en) Identify unknown data object
CN110019616B (en) POI (Point of interest) situation acquisition method and equipment, storage medium and server thereof
CN108090351B (en) Method and apparatus for processing request message
CN107257390B (en) URL address resolution method and system
US20200167427A1 (en) Training an artificial intelligence to generate an answer to a query based on an answer table pattern
US11263062B2 (en) API mashup exploration and recommendation
CN103678487A (en) Method and device for generating web page snapshot
CN110909229A (en) Webpage data acquisition and storage system based on simulated browser access
CN105653949B (en) A kind of malware detection methods and device
CN111241496B (en) Method and device for determining small program feature vector and electronic equipment
CN109146625B (en) Content-based multi-version App update evaluation method and system
Jisha et al. Mobile applications recommendation based on user ratings and permissions
CN104392171A (en) Automatic memory evidence analyzing method based on data association
US11301522B1 (en) Method and apparatus for collecting information regarding dark web
CN111897828A (en) Data batch processing implementation method, device, equipment and storage medium
CN116821903A (en) Detection rule determination and malicious binary file detection method, device and medium
CN115314268B (en) Malicious encryption traffic detection method and system based on traffic fingerprint and behavior
CN113435950B (en) Bill processing method and device
CN112579855A (en) Method and device for extracting feature codes of WeChat article
US9996799B2 (en) Migrating a legacy system by inferring context-sensitive business rules from legacy source code
CN115203674A (en) Automatic login method, system, device and storage medium for application program
CN104361094A (en) Storage method and device for file in search result, and browser client
CN114491528A (en) Malicious software detection method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination