CN114417396A - Privacy policy text data extraction method and device, electronic equipment and storage medium - Google Patents

Privacy policy text data extraction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114417396A
CN114417396A CN202111522146.9A CN202111522146A CN114417396A CN 114417396 A CN114417396 A CN 114417396A CN 202111522146 A CN202111522146 A CN 202111522146A CN 114417396 A CN114417396 A CN 114417396A
Authority
CN
China
Prior art keywords
privacy policy
hyperlink
text data
display window
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111522146.9A
Other languages
Chinese (zh)
Other versions
CN114417396B (en
Inventor
陈业炫
刘涛
赵帅
齐向东
吴云坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qi'an Pangu Shanghai Information Technology Co ltd
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Original Assignee
Qi'an Pangu Shanghai Information Technology Co ltd
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qi'an Pangu Shanghai Information Technology Co ltd, Qianxin Technology Group Co Ltd, Secworld Information Technology Beijing Co Ltd filed Critical Qi'an Pangu Shanghai Information Technology Co ltd
Priority to CN202111522146.9A priority Critical patent/CN114417396B/en
Publication of CN114417396A publication Critical patent/CN114417396A/en
Application granted granted Critical
Publication of CN114417396B publication Critical patent/CN114417396B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Abstract

The invention provides a method and a device for extracting privacy policy text data, electronic equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps that under the condition that a privacy policy display window appears on a terminal interface and a privacy policy hyperlink exists in the privacy policy display window, position information of the privacy policy hyperlink, sent by a terminal, on the terminal interface is obtained; and clicking the privacy policy hyperlink based on the position information of the privacy policy hyperlink on the terminal interface, displaying privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extracting the privacy policy text data from the privacy policy display window. According to the method and device for extracting the text data of the privacy policy, the electronic equipment and the storage medium, the success rate and the accuracy rate of extracting the text data of the privacy policy are improved.

Description

Privacy policy text data extraction method and device, electronic equipment and storage medium
Technical Field
The invention relates to the field of mobile application, in particular to a method and a device for extracting text data of a privacy policy, electronic equipment and a storage medium.
Background
The currently mainstream way of obtaining the privacy policy text is to obtain the privacy policy provided by a developer from an application market, to find the privacy policy text by statically scanning an application file, and to extract the scanned file page by using an OCR technology.
The above techniques can extract the privacy policy text to a certain extent, but all have certain defects, for example, the privacy policy extracted from the application market may be different from the privacy policy displayed to the user in the actual operation process of the mobile application because the privacy policy is not updated in time; the static scanning application files are loaded and displayed in a URL mode because a large part of privacy policies are loaded and displayed and the file format is not fixed, so that the privacy policy files cannot be extracted with a high probability; the perfect OCR technology can acquire the privacy policy text more accurately, but because the mobile phone screen is too small and the privacy policy text is generally long, the complete privacy policy text can be extracted only by sliding the page for many times, and the time consumption is defective.
Disclosure of Invention
The invention provides a method and a device for extracting text data of a privacy policy, electronic equipment and a storage medium, which are used for solving the technical problems in the prior art.
The invention provides a method for extracting text data of a privacy policy, which comprises the following steps:
the method comprises the steps that under the condition that a privacy policy display window appears on a terminal interface and a privacy policy hyperlink exists in the privacy policy display window, position information of the privacy policy hyperlink, sent by a terminal, on the terminal interface is obtained;
and clicking the privacy policy hyperlink based on the position information of the privacy policy hyperlink on the terminal interface, displaying privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extracting the privacy policy text data from the privacy policy display window.
According to the privacy policy text data extraction method provided by the invention, the method further comprises the following steps: and under the condition that a privacy policy display window appears on a terminal interface and no privacy policy hyperlink exists in the privacy policy display window, extracting privacy policy text data from the privacy policy display window.
According to the privacy policy text data extraction method provided by the invention, the method further comprises the following steps: and under the condition that the privacy policy display window does not appear on the terminal interface, responding to the input of a user, displaying the privacy policy display window on the terminal interface, and extracting the privacy policy text data from the privacy policy display window.
According to the method for extracting the text data of the privacy policy provided by the invention, the method for extracting the text data of the privacy policy from the privacy policy display window comprises the following steps: traversing all windows under the current terminal interface, acquiring specified data of all windows under the display state, performing deduplication and arrangement on all the acquired specified data, merging the deduplicated and arranged specified data, and obtaining privacy policy text data from the merged specified data.
According to the method for extracting the text data of the privacy policy, the removing of the duplication and the arrangement of all the acquired specified data comprise the following steps:
traversing tree structures corresponding to the designated data of all windows in the display state;
merging tree nodes corresponding to the same designated data in different windows and subtrees of the tree nodes;
and merging the tree nodes corresponding to the rest of the designated data through the root node DecorView of the window.
According to the method for extracting the text data of the privacy policy, provided by the invention, the specified data is the accessilinylNodeInfo data;
correspondingly, the obtaining privacy policy text data from the merged specifying data includes:
under the condition that a text field in the specified data is not empty, taking the content of the text field as privacy policy text data;
and under the condition that the text field in the specified data is empty and the description field is not empty, taking the content of the description field as privacy policy text data.
According to the method for extracting the text data of the privacy policy, the acquiring of the position information of the hyperlink of the privacy policy on the terminal interface sent by the terminal comprises the following steps:
initiating a request to a terminal so that the terminal acquires start and stop point information of the privacy policy hyperlink in a text currently displayed in the privacy policy display window by using a setSpan function of a span class; converting the start point and end point information into position information of the privacy policy hyperlink in a text currently displayed in the privacy policy display window by utilizing a getTextBounds function of a TextPaint class; converting the position information of the privacy policy hyperlink into the position information of the privacy policy hyperlink in the text view by using a drawTextRun function of a TextLine class; and converting the position information of the privacy policy hyperlink in the text View into the position information of the privacy policy hyperlink on the terminal interface by utilizing a getBeundsOnScreen function of a View class.
The invention also provides a privacy policy text data extraction device, which comprises:
the coordinate information acquisition module is used for acquiring the position information of the privacy policy hyperlink, sent by the terminal, on the terminal interface under the condition that the privacy policy display window appears on the terminal interface and the privacy policy hyperlink exists in the privacy policy display window;
and the privacy policy extraction module is used for clicking the privacy policy hyperlink based on the position information of the privacy policy hyperlink on the terminal interface, displaying privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extracting the privacy policy text data from the privacy policy display window.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of any one of the above-mentioned methods for extracting the text data of the privacy policy.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the privacy policy text data extraction method as described in any one of the above.
The present invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method for extracting privacy policy text data according to any one of the above-mentioned methods.
According to the method and the device for extracting the text data of the privacy policy, the electronic equipment and the storage medium, the hyperlink of the privacy policy is clicked by using an automatic clicking tool based on the position information of the hyperlink of the privacy policy sent by the terminal, and the text data of the privacy policy in the display window of the privacy policy is finally obtained.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of a method for extracting text data of a privacy policy provided by the present invention;
fig. 2 is a schematic structural diagram of a privacy policy text data extraction apparatus provided in the present invention;
fig. 3 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for extracting text data of a privacy policy according to the present invention, as shown in fig. 1, the method includes:
s110, under the condition that a privacy policy display window appears on a terminal interface and a privacy policy hyperlink exists in the privacy policy display window, acquiring position information of the privacy policy hyperlink, sent by a terminal, on the terminal interface;
s120, clicking the privacy policy hyperlink based on the position information of the privacy policy hyperlink on the terminal interface, displaying privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extracting the privacy policy text data from the privacy policy display window.
It should be noted that there are two display modes for the privacy policy text, one is that all contents of the privacy policy text are directly displayed in the privacy policy display window, the other is that only a summary of the privacy policy content is displayed, the remaining text content exists in the text summary in the form of a hyperlink, and when the user wants to acquire the complete privacy policy text content, the user needs to click the hyperlink in the summary to enter a new interface, so as to acquire the complete privacy policy text content.
When a privacy policy display window appears on a terminal interface and a privacy policy hyperlink exists in the window, the automatic clicking tool receives coordinate information of the privacy policy hyperlink sent by the terminal on the terminal interface, wherein the automatic clicking tool can be UiAutomator, is an automatic clicking tool of an android source and is mainly used for testing. In this embodiment, the privacy policy hyperlink is identified by a preset keyword, for example, when a hyperlink in the privacy policy display window has a preset keyword, the hyperlink is determined to be a privacy policy hyperlink, and the preset keyword is any one or more of the following words: privacy policy, privacy agreement, privacy statement, privacy terms, privacy protection policy, privacy protection agreement, privacy protection statement, privacy protection terms, privacy information protection, privacy.
The automatic clicking tool clicks the privacy policy hyperlink based on coordinate information sent by the terminal, then enters an interface with complete privacy policy text data in a privacy policy display window, extracts the complete privacy policy text data and keeps the complete privacy policy text data as a local file.
According to the method for extracting the text data of the privacy policy, the hyperlink of the privacy policy is clicked by using an automatic clicking tool based on the position information of the hyperlink of the privacy policy sent by the terminal, and the text data of the privacy policy in the display window of the privacy policy is finally obtained, so that the success rate and the accuracy rate of extracting the text data of the privacy policy are improved; meanwhile, compared with the prior art, the method and the device also reduce the acquisition cost of the privacy policy text data.
According to the method for extracting the text data of the privacy policy, the method further comprises the following steps: and under the condition that a privacy policy display window appears on a terminal interface and no privacy policy hyperlink exists in the privacy policy display window, extracting privacy policy text data from the privacy policy display window.
As described above, when a privacy policy display window exists on a terminal interface and a privacy policy hyperlink does not exist in the privacy policy display window, obtaining privacy policy text data in the privacy policy display window by directly using the accessilibnodeinfo in the ui automation tool, where the accessilibnodeinfo is a type maintained by the ui automation tool for controlling automatic clicking and obtaining View (View) information, and basically obtaining all View (View) information in the android system can be achieved.
According to the method for extracting the text data of the privacy policy, disclosed by the invention, the text data of the privacy policy in the privacy policy window without the hyperlink of the privacy policy is obtained by utilizing the accessilinylNodeInfo in the UiAutomator, so that the obtaining mode of the text data of the privacy policy is enriched, and meanwhile, the flexible processing process of different extracting modes is realized according to different existing conditions of the text data of the privacy policy.
According to the method for extracting the text data of the privacy policy, the method further comprises the following steps:
and under the condition that the privacy policy display window does not appear on the terminal interface, responding to the input of a user, displaying the privacy policy display window on the terminal interface, and extracting the privacy policy text data from the privacy policy display window.
It should be noted that, when no privacy policy display window exists on the terminal interface, the input of the user specifically includes: and entering a My interface or a registration login interface, and clicking a preset keyword on the My interface or the registration login interface. Responding to the input of a user, enabling the terminal to enter a privacy policy display window, acquiring privacy policy text data in the privacy policy window through the accessibility NodeInfo in the UiAutomator, and storing the acquired text data as a local file, wherein the preset keyword is consistent with the preset keyword.
According to the method for extracting the text data of the privacy policy, the interface where the text data of the privacy policy exists is accessed and extracted by sequentially clicking the My interface and the preset keywords on the interface or sequentially clicking the registration login interface and the preset keywords on the interface; the above process mainly aims at the condition that a privacy policy display window does not exist in the terminal interface, so that accurate extraction of the text data of the privacy policy can be still realized under the condition, and the success rate of obtaining the text data is improved.
According to the method for extracting the text data of the privacy policy provided by the invention, in the invention, the extracting the text data of the privacy policy from the privacy policy display window comprises the following steps:
traversing all windows under the current terminal interface, acquiring specified data of all windows under the display state, performing deduplication and arrangement on all the acquired specified data, merging the deduplicated and arranged specified data, and obtaining privacy policy text data from the merged specified data.
It should be noted that, because the external interface provided by the uiautomation can only obtain the Window in focus, in some cases, the privacy policy interface is not the focus Window when displayed, so that the uiautomation cannot obtain the accessibilitynodenfo data of the Window corresponding to the privacy policy. In the invention, all windows under the current terminal interface are traversed, designated data of all windows under a display state are obtained, wherein the designated data is access availability NodeInfo data, the access availability NodeInfo data exists in a tree structure, and access availability NodeInfo information of multiple windows has the problems of repetition and disorder, so that all the obtained access availability NodeInfo data are subjected to de-duplication and arrangement, and the de-duplicated and arranged data are combined to finally obtain privacy policy text data.
According to the method for extracting the text data of the privacy policy, provided by the invention, the access policy NodeInfo data in all windows are merged by traversing all windows under the current terminal interface to obtain the final text data of the privacy policy, so that the situation that the text data of the privacy policy is extracted unsuccessfully because the display window of the privacy policy is not the focus window is avoided, and the success rate of extracting the text data of the privacy policy is improved.
According to the method for extracting the text data of the privacy policy provided by the invention, in the invention, the removing the duplicate and the arranging of all the acquired specified data comprise the following steps:
traversing tree structures corresponding to the designated data of all windows in the display state; merging tree nodes corresponding to the same designated data in different windows and subtrees of the tree nodes; and merging the tree nodes corresponding to the rest of the designated data through the root node DecorView of the window.
It should be noted that the tree structure refers to a data structure in which a one-to-many tree relationship exists between data elements, and is an important nonlinear data structure, in the tree structure, a root node of a tree has no predecessor nodes, each of the other nodes has one predecessor node, a leaf node has no subsequent node, the number of the subsequent nodes of each of the other nodes may be one or more, and stored is View information (View), and the View is used as a tree node in the tree structure. A window corresponds to a DecorView, which is equivalent to the root node View of the window. The views are stored in a tree structure, so that one window corresponds to one DecorView and all child node views thereof, and all changes to the window are distributed through the DecorView.
In order to output the access availability nodeinfo data of a plurality of windows in accordance with the format, the access availability nodeinfo data of the plurality of windows need to be merged into the same tree structure, firstly, the tree structures corresponding to the access availability nodeinfo data of all the windows are traversed (the access availability nodeinfo and the View are in one-to-one correspondence and are also stored in the tree structures), the same access availability nodeinfo in different windows is found, and because the repeated access availability nodeinfo exists, a plurality of repeated tree nodes exist in the newly generated access availability nodeinfo tree structures directly through a tree root node merging mode, and the traversal time can enter a dead loop. Therefore, it is preferable to merge these repeated accessilinylnodeinfo, merge their subtrees, and merge the remaining accessilinylnodeinfo through the tree node decoview, where merging refers to transferring all the child nodes of the tree node of one tree to another tree as its child nodes.
According to the method for extracting the privacy policy text data, the data acquired in each window are subjected to duplication removal processing and arrangement processing, so that the window can be effectively prevented from entering endless loop when being traversed, and meanwhile, the accuracy of the finally acquired privacy policy text data can be ensured.
According to the method for extracting the text data of the privacy policy, the specified data is the accessilinylNodeInfo data;
correspondingly, the obtaining privacy policy text data from the merged specifying data includes:
under the condition that a text field in the specified data is not empty, taking the content of the text field as privacy policy text data; and under the condition that the text field in the specified data is empty and the description field is not empty, taking the content of the description field as privacy policy text data.
It should be noted that in the process of extracting the privacy policy, it is often encountered that a view of the content of a specific character exists on the plain interface, but the view cannot be identified. After analysis, the existing part SDK can transmit the text content of the View to the description field of the accessilinylNodeInfo and set the text field to be null. Therefore, when the text field in the accessilinylNodeInfo is not empty, the text field is directly used as the text content; when the text field is empty and the description field is not empty, the description field is taken as the text content (usually the description field is a description field).
According to the method for extracting the text data of the privacy policy, the text data acquisition process based on different field extraction modes under different conditions is carried out on the accessilinylNodeInfo data obtained after combination, so that the success rate of extracting the text data of the privacy policy is improved.
According to the method for extracting the text data of the privacy policy, the method for acquiring the position information of the hyperlink of the privacy policy sent by the terminal on the terminal interface comprises the following steps:
initiating a request to a terminal so that the terminal acquires start and stop point information of the privacy policy hyperlink in a text currently displayed in the privacy policy display window by using a setSpan function of a span class; converting the start point and end point information into position information of the privacy policy hyperlink in a text currently displayed in the privacy policy display window by utilizing a getTextBounds function of a TextPaint class; converting the position information of the privacy policy hyperlink into the position information of the privacy policy hyperlink in the text view by using a drawTextRun function of a TextLine class; and converting the position information of the privacy policy hyperlink in the text View into the position information of the privacy policy hyperlink on the terminal interface by utilizing a getBeundsOnScreen function of a View class.
It should be noted that the start point and end point information of the privacy policy hyperlink in the current display text refers to the line number of the privacy policy hyperlink in the display text, that is, the line number, and the start character string position and the end character string position under the line; the location information of the privacy policy hyperlink in the currently displayed text refers to the coordinate information of the privacy policy hyperlink in the text, namely (x)1,y1) (ii) a Location information of the privacy policy hyperlink in the text view refers to coordinate information of the privacy policy hyperlink in the text view, namely (x)2,y2) (ii) a The position information of the privacy policy hyperlink on the terminal interface refers to the coordinate information of the privacy policy hyperlink on the screen, namely (x)3,y3)。
Privacy policy hyperlinks usually exist in long texts, and in the drawing flow of TextView (text view) with hyperlinks, firstly, the text identified as a hyperlink is transferred into the TextView as displayed text by using a setSpan function of a Spannablestring class, and then the rendered text is drawn by a drawTextRun function of a TextLine class. In the invention, the starting point and the end point of the hyperlink in the text are obtained in a setSpan function of a span class, the character content of the hyperlink is obtained, and whether the hyperlink is a privacy policy hyperlink is judged by judging whether the preset keyword is contained. If so, saving the relative position of the hyperlink (the starting point and the ending point in the text) and the character content thereof;
when the drawing flow of the TextView (text view) goes to a drawTextRun function of a TextLine class, whether a saved hyperlink exists or not and whether the content of the hyperlink is consistent with the character content of the corresponding position of the current text or not are judged, and under the condition of consistency, the position coordinate of the hyperlink on the long text is obtained through a getTextBounds function of a TextPaint class; then, shifting the position coordinates x and y of the hyperlink of the long text in the View by using a drawTextRun function to obtain the position coordinates of the hyperlink in the View;
and finally, obtaining the position coordinates of the View in the screen through a getBeundsOnScreen function of the View class, obtaining padding on the upper side and the left side of the View, offsetting the values and the position coordinates of the hyperlink in the View, finally obtaining the position coordinates (absolute coordinates) of the hyperlink in the screen, and transmitting the position coordinates to the UiAutomator for clicking.
According to the method for extracting the text data of the privacy policy, the position information of the hyperlink of the privacy policy in the current display text, the coordinate information in the text view and the coordinate information on the screen are sequentially obtained by calling a plurality of functions, and finally the coordinate information of the hyperlink of the privacy policy on the screen is sent to an automatic clicking tool UiAutomatotor, so that the UiAutomatotor clicks the hyperlink of the privacy policy on the basis of the coordinate information of the hyperlink of the privacy policy on the screen, the text data of the privacy policy is finally obtained, the defect that the accurateposition NodeInfo of the access policy can not obtain the accurate position policy of the specific character in the line-changed long text is overcome, and the successful extraction of the text data of the privacy policy is realized.
Fig. 2 is a schematic structural diagram of a privacy policy text data extraction apparatus provided in the present invention, and as shown in fig. 2, the apparatus includes:
the coordinate information obtaining module 210 is configured to obtain location information of a privacy policy hyperlink sent by a terminal on a terminal interface when the privacy policy display window appears on the terminal interface and the privacy policy hyperlink exists in the privacy policy display window;
a privacy policy extraction module 220, configured to click the privacy policy hyperlink based on the location information of the privacy policy hyperlink on the terminal interface, display privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extract privacy policy text data from the privacy policy display window.
According to the device for extracting the text data of the privacy policy, the hyperlink of the privacy policy is clicked by using an automatic clicking tool based on the position information of the hyperlink of the privacy policy sent by the terminal, and the text data of the privacy policy in the display window of the privacy policy is finally obtained.
According to the privacy policy text data extraction device provided by the invention, in the invention, the device further comprises:
the system comprises a direct extraction module and a privacy policy display module, wherein the direct extraction module is used for extracting privacy policy text data from a privacy policy display window when the privacy policy display window appears on a terminal interface and no privacy policy hyperlink exists in the privacy policy display window.
According to the privacy policy text data extraction device provided by the invention, the privacy policy text data in the privacy policy window without the privacy policy hyperlink is obtained by using the accessilinylNodeInfo in the UiAutomator, so that the obtaining mode of the privacy policy text data is enriched, and meanwhile, the flexible processing process of different extraction modes is realized according to different existing conditions of the privacy policy text data.
According to the privacy policy text data extraction device provided by the invention, in the invention, the device further comprises:
and the response input module is used for responding to the input of a user when the privacy policy display window does not appear on the terminal interface, displaying the privacy policy display window on the terminal interface, and extracting the privacy policy text data from the privacy policy display window.
According to the device for extracting the text data of the privacy policy, the interface where the text data of the privacy policy exists is accessed and extracted by sequentially clicking the 'my' interface and the preset keywords on the interface or sequentially clicking the registration login interface and the preset keywords on the interface; the above process mainly aims at the condition that a privacy policy display window does not exist in the terminal interface, so that accurate extraction of the text data of the privacy policy can be still realized under the condition, and the success rate of obtaining the text data is improved.
According to the device for extracting text data of a privacy policy provided by the present invention, in the present invention, the privacy policy extraction module 220, when being used for extracting text data of a privacy policy from the privacy policy display window, is specifically configured to:
traversing all windows under the current terminal interface, acquiring specified data of all windows under the display state, performing deduplication and arrangement on all the acquired specified data, merging the deduplicated and arranged specified data, and obtaining privacy policy text data from the merged specified data.
According to the privacy policy text data extraction device provided by the invention, the access policy NodeInfo data in all windows are merged by traversing all windows under the current terminal interface to obtain the final privacy policy text data, so that the situation that the privacy policy text data extraction fails because the privacy policy display window is not the focus window is avoided, and the success rate of extracting the privacy policy text data is improved.
According to the device for extracting text data of a privacy policy provided by the present invention, in the present invention, the privacy policy extraction module 220, when being used for performing deduplication and permutation on all the obtained specified data, is specifically configured to:
traversing tree structures corresponding to the designated data of all windows in the display state; merging tree nodes corresponding to the same designated data in different windows and subtrees of the tree nodes; and merging the tree nodes corresponding to the rest of the designated data through the root node DecorView of the window.
According to the privacy policy text data extraction device provided by the invention, through carrying out duplication removal processing and arrangement processing on the data acquired in each window, the condition that a dead cycle is entered when the window is traversed can be effectively avoided, and meanwhile, the accuracy of finally obtained privacy policy text data can be ensured.
According to the privacy policy text data extraction device provided by the invention, in the invention, the specified data is the accessilinylNodeInfo data; when the privacy policy extraction module 220 is configured to obtain the privacy policy text data from the merged specified data, the privacy policy extraction module specifically includes:
under the condition that a text field in the specified data is not empty, taking the content of the text field as privacy policy text data; and under the condition that the text field in the specified data is empty and the description field is not empty, taking the content of the description field as privacy policy text data.
According to the privacy policy text data extraction device, the text data acquisition process based on different field extraction modes under different conditions is carried out on the accessoricityNodeInfo data obtained after combination, so that the success rate of extracting the privacy policy text data is improved.
According to the device for extracting text data of a privacy policy provided by the present invention, in the present invention, when the coordinate information obtaining module 210 is used for obtaining the location information of the privacy policy hyperlink sent by the terminal on the terminal interface, the method specifically includes:
initiating a request to a terminal so that the terminal acquires start and stop point information of the privacy policy hyperlink in a text currently displayed in the privacy policy display window by using a setSpan function of a span class; converting the start point and end point information into position information of the privacy policy hyperlink in a text currently displayed in the privacy policy display window by utilizing a getTextBounds function of a TextPaint class; converting the position information of the privacy policy hyperlink into the position information of the privacy policy hyperlink in the text view by using a drawTextRun function of a TextLine class; and converting the position information of the privacy policy hyperlink in the text View into the position information of the privacy policy hyperlink on the terminal interface by utilizing a getBeundsOnScreen function of a View class.
According to the privacy policy text data extraction device provided by the invention, the position information of the privacy policy hyperlink in the current display text, the coordinate information in the text view and the coordinate information on the screen are sequentially obtained by calling a plurality of functions, and finally the coordinate information of the privacy policy hyperlink on the screen is sent to the automatic clicking tool UiAutomatotor, so that the UiAutomatotor clicks the privacy hyperlink on the basis of the coordinate information of the privacy policy hyperlink on the screen to finally obtain the privacy policy text data.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform a privacy policy text data extraction method comprising: the method comprises the steps that under the condition that a privacy policy display window appears on a terminal interface and a privacy policy hyperlink exists in the privacy policy display window, position information of the privacy policy hyperlink, sent by a terminal, on the terminal interface is obtained; and clicking the privacy policy hyperlink based on the position information of the privacy policy hyperlink on the terminal interface, displaying privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extracting the privacy policy text data from the privacy policy display window.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the privacy policy text data extraction method provided by the above methods, the method comprising: the method comprises the steps that under the condition that a privacy policy display window appears on a terminal interface and a privacy policy hyperlink exists in the privacy policy display window, position information of the privacy policy hyperlink, sent by a terminal, on the terminal interface is obtained; and clicking the privacy policy hyperlink based on the position information of the privacy policy hyperlink on the terminal interface, displaying privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extracting the privacy policy text data from the privacy policy display window.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform the privacy policy text data extraction methods provided above, the method comprising: the method comprises the steps that under the condition that a privacy policy display window appears on a terminal interface and a privacy policy hyperlink exists in the privacy policy display window, position information of the privacy policy hyperlink, sent by a terminal, on the terminal interface is obtained; and clicking the privacy policy hyperlink based on the position information of the privacy policy hyperlink on the terminal interface, displaying privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extracting the privacy policy text data from the privacy policy display window.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (11)

1. A privacy policy text data extraction method is characterized by comprising the following steps:
the method comprises the steps that under the condition that a privacy policy display window appears on a terminal interface and a privacy policy hyperlink exists in the privacy policy display window, position information of the privacy policy hyperlink, sent by a terminal, on the terminal interface is obtained;
and clicking the privacy policy hyperlink based on the position information of the privacy policy hyperlink on the terminal interface, displaying privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extracting the privacy policy text data from the privacy policy display window.
2. The privacy policy text data extraction method of claim 1, wherein the method further comprises:
and under the condition that a privacy policy display window appears on a terminal interface and no privacy policy hyperlink exists in the privacy policy display window, extracting privacy policy text data from the privacy policy display window.
3. The privacy policy text data extraction method according to claim 1 or 2, the method further comprising:
and under the condition that the privacy policy display window does not appear on the terminal interface, responding to the input of a user, displaying the privacy policy display window on the terminal interface, and extracting the privacy policy text data from the privacy policy display window.
4. The privacy policy text data extraction method according to claim 1 or 2, wherein the extracting privacy policy text data from the privacy policy display window includes:
traversing all windows under the current terminal interface, acquiring specified data of all windows under the display state, performing deduplication and arrangement on all the acquired specified data, merging the deduplicated and arranged specified data, and obtaining privacy policy text data from the merged specified data.
5. The privacy policy text data extraction method according to claim 4, wherein the de-duplication and ranking of all the acquired specified data comprises:
traversing tree structures corresponding to the designated data of all windows in the display state;
merging tree nodes corresponding to the same designated data in different windows and subtrees of the tree nodes;
and merging the tree nodes corresponding to the rest of the designated data through the root node DecorView of the window.
6. The method of claim 4, wherein the designated data is accessilinylnodeinfo data;
correspondingly, the obtaining privacy policy text data from the merged specifying data includes:
under the condition that a text field in the specified data is not empty, taking the content of the text field as privacy policy text data;
and under the condition that the text field in the specified data is empty and the description field is not empty, taking the content of the description field as privacy policy text data.
7. The method for extracting text data of a privacy policy according to claim 1, wherein the obtaining of the location information of the privacy policy hyperlink sent by the terminal on the terminal interface comprises:
initiating a request to a terminal so that the terminal acquires start and stop point information of the privacy policy hyperlink in a text currently displayed in the privacy policy display window by using a setSpan function of a span class; converting the start point and end point information into position information of the privacy policy hyperlink in a text currently displayed in the privacy policy display window by utilizing a getTextBounds function of a TextPaint class; converting the position information of the privacy policy hyperlink into the position information of the privacy policy hyperlink in the text view by using a drawTextRun function of a TextLine class; and converting the position information of the privacy policy hyperlink in the text View into the position information of the privacy policy hyperlink on the terminal interface by utilizing a getBeundsOnScreen function of a View class.
8. A privacy policy text data extraction apparatus, comprising:
the coordinate information acquisition module is used for acquiring the position information of the privacy policy hyperlink, sent by the terminal, on the terminal interface under the condition that the privacy policy display window appears on the terminal interface and the privacy policy hyperlink exists in the privacy policy display window;
and the privacy policy extraction module is used for clicking the privacy policy hyperlink based on the position information of the privacy policy hyperlink on the terminal interface, displaying privacy policy text data corresponding to the privacy policy hyperlink in the privacy policy display window, and extracting the privacy policy text data from the privacy policy display window.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for extracting privacy policy text data according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the privacy policy text data extraction method according to any one of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method for extracting privacy policy text data according to any one of claims 1 to 7.
CN202111522146.9A 2021-12-13 2021-12-13 Privacy policy text data extraction method and device, electronic equipment and storage medium Active CN114417396B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111522146.9A CN114417396B (en) 2021-12-13 2021-12-13 Privacy policy text data extraction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111522146.9A CN114417396B (en) 2021-12-13 2021-12-13 Privacy policy text data extraction method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114417396A true CN114417396A (en) 2022-04-29
CN114417396B CN114417396B (en) 2023-03-24

Family

ID=81266323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111522146.9A Active CN114417396B (en) 2021-12-13 2021-12-13 Privacy policy text data extraction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114417396B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423446A (en) * 2017-08-11 2017-12-01 义乌工商职业技术学院 New media based on cloud storage automates credible deployment system and method
CN111766993A (en) * 2020-05-29 2020-10-13 维沃移动通信有限公司 Information display method and device, electronic equipment and readable storage medium
WO2020247405A1 (en) * 2019-06-03 2020-12-10 Jpmorgan Chase Bank, N.A. Systems and methods for managing privacy policies using machine learning
CN112181255A (en) * 2020-10-12 2021-01-05 深圳市欢太科技有限公司 Control identification method and device, terminal equipment and storage medium
CN112565238A (en) * 2020-11-30 2021-03-26 杭州华橙软件技术有限公司 Method for popping privacy policy, client and computer-readable storage medium
CN112631704A (en) * 2020-12-26 2021-04-09 深圳集智数字科技有限公司 Interface element identification method and device, storage medium and electronic equipment
CN113051607A (en) * 2021-03-11 2021-06-29 天津大学 Privacy policy information extraction method
CN113076538A (en) * 2021-04-02 2021-07-06 北京邮电大学 Method for extracting embedded privacy policy of mobile application APK file
CN113177205A (en) * 2021-04-27 2021-07-27 国家计算机网络与信息安全管理中心 Malicious application detection system and method
CN113254923A (en) * 2021-06-25 2021-08-13 南京网眼信息技术有限公司 Method and system for generating privacy policy text according to APK (android package)

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423446A (en) * 2017-08-11 2017-12-01 义乌工商职业技术学院 New media based on cloud storage automates credible deployment system and method
WO2020247405A1 (en) * 2019-06-03 2020-12-10 Jpmorgan Chase Bank, N.A. Systems and methods for managing privacy policies using machine learning
CN111766993A (en) * 2020-05-29 2020-10-13 维沃移动通信有限公司 Information display method and device, electronic equipment and readable storage medium
CN112181255A (en) * 2020-10-12 2021-01-05 深圳市欢太科技有限公司 Control identification method and device, terminal equipment and storage medium
CN112565238A (en) * 2020-11-30 2021-03-26 杭州华橙软件技术有限公司 Method for popping privacy policy, client and computer-readable storage medium
CN112631704A (en) * 2020-12-26 2021-04-09 深圳集智数字科技有限公司 Interface element identification method and device, storage medium and electronic equipment
CN113051607A (en) * 2021-03-11 2021-06-29 天津大学 Privacy policy information extraction method
CN113076538A (en) * 2021-04-02 2021-07-06 北京邮电大学 Method for extracting embedded privacy policy of mobile application APK file
CN113177205A (en) * 2021-04-27 2021-07-27 国家计算机网络与信息安全管理中心 Malicious application detection system and method
CN113254923A (en) * 2021-06-25 2021-08-13 南京网眼信息技术有限公司 Method and system for generating privacy policy text according to APK (android package)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张铭喆: "Android应用隐私协议提取验证系统的设计与实现", 《信息科技》 *

Also Published As

Publication number Publication date
CN114417396B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
US11620321B2 (en) Artificial intelligence based method and apparatus for processing information
US20190179965A1 (en) Method and apparatus for generating information
US11030384B2 (en) Identification of sequential browsing operations
JP2018116496A (en) Difference detection device and program
CN106874271A (en) A kind of method and system that PC webpages are converted to mobile terminal webpage
CN109657121A (en) A kind of Web page information acquisition method and device based on web crawlers
US9336316B2 (en) Image URL-based junk detection
CN113382083A (en) Webpage screenshot method and device
CN112835579A (en) Method and device for determining interface code, electronic equipment and storage medium
CN104915404A (en) Method and device for accessing invalid website
CN108494728B (en) Method, device, equipment and medium for creating blacklist library for preventing traffic hijacking
CN113204695A (en) Website identification method and device
CN114417396B (en) Privacy policy text data extraction method and device, electronic equipment and storage medium
JP2019101889A (en) Test execution device and program
CN115758011A (en) Data unloading method, data display method, device, equipment and storage medium
CN108038233B (en) Method and device for collecting articles, electronic equipment and storage medium
CN113342450B (en) Page processing method, device, electronic equipment and computer readable medium
CN113138974B (en) Method and device for detecting database compliance
CN112767933B (en) Voice interaction method, device, equipment and medium of highway maintenance management system
CN102087653A (en) Method and device for issuing website information
CN114328947A (en) Knowledge graph-based question and answer method and device
CN105488054B (en) A kind of method and device of browsing pictures
CN113485782A (en) Page data acquisition method and device, electronic equipment and medium
JP2018120256A (en) Setting operation input support apparatus and setting operation input support system
CN112711435A (en) Version updating method, version updating device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 201100 floor 3, building 3, No. 2555, Hechuan Road, Minhang District, Shanghai

Applicant after: Qi'an Pangu (Shanghai) Information Technology Co.,Ltd.

Applicant after: Qianxin Technology Group Co.,Ltd.

Applicant after: Qianxin Wangshen information technology (Beijing) Co.,Ltd.

Address before: 201100 floor 3, building 3, No. 2555, Hechuan Road, Minhang District, Shanghai

Applicant before: Qi'an Pangu (Shanghai) Information Technology Co.,Ltd.

Applicant before: Qianxin Technology Group Co.,Ltd.

Applicant before: LEGENDSEC INFORMATION TECHNOLOGY (BEIJING) Inc.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant