CN115292647A - Non-invasive government data acquisition method - Google Patents

Non-invasive government data acquisition method Download PDF

Info

Publication number
CN115292647A
CN115292647A CN202211219122.0A CN202211219122A CN115292647A CN 115292647 A CN115292647 A CN 115292647A CN 202211219122 A CN202211219122 A CN 202211219122A CN 115292647 A CN115292647 A CN 115292647A
Authority
CN
China
Prior art keywords
information
virtual machine
machine platform
platform
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211219122.0A
Other languages
Chinese (zh)
Inventor
傅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yitethink Information Technology Co ltd
Original Assignee
Beijing Yitethink Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yitethink Information Technology Co ltd filed Critical Beijing Yitethink Information Technology Co ltd
Priority to CN202211219122.0A priority Critical patent/CN115292647A/en
Publication of CN115292647A publication Critical patent/CN115292647A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/038Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the technical field of government affair information processing, and discloses a non-invasive government affair data acquisition method, which comprises the following steps: the method comprises the steps of obtaining target platform information of an execution platform according to an information obtaining task, obtaining a compatible version of a required operation environment according to the target platform information, configuring a virtual machine platform with the compatible version according to the information obtaining task, updating an identifier of the virtual machine platform and a polling database of the virtual machine platform, obtaining the information obtaining task and a method class matched with the information obtaining task, loading and instantiating the method class, simulating a user operation event by using the instantiated method class, responding to the user operation event, copying text obtaining information through screen interception or keyboard and mouse events, and updating the state of the information obtaining task according to the information obtained by the virtual machine platform. The invention can dynamically configure resources according to different types of requirements to acquire the E-government data, and has better adaptability and expansibility.

Description

Non-invasive government data acquisition method
Technical Field
The invention relates to the field of government affair information processing, in particular to a non-invasive government affair data obtaining method.
Background
RPA robots are a computer-based and rule-based software technology that enables automation of manual operations by performing repetitive, rule-based tasks.
In the government affair information processing process, multiple terminals and data sources are needed to be involved in information acquisition and processing, the speed and compatibility and resource scheduling problems need to be considered for data processing, for example, the design of partial pages needs to be accessible by a browser or a specific operating system in a compatible mode, the acquisition mode of partial information needs to be obtained by running a virtual machine to run an App, and the partial information sources have requirements on a rendering environment, so that multiple clients and configurations are needed to meet the requirements.
In addition, the partial information page disables background debugging and provides different outputs for different visitors, for which additional necessary configuration is necessary.
Disclosure of Invention
The present invention is directed to overcoming one or more of the problems set forth above and to providing a method for non-intrusive government data acquisition.
In order to achieve the above object, the present invention provides a method for obtaining non-invasive government affair data, including:
acquiring target platform information of an execution platform according to an information acquisition task, and acquiring a compatible version of a required operating environment according to the target platform information;
configuring a virtual machine platform with a compatible version according to the information acquisition task, and updating an identifier of the virtual machine platform and a polling database of the virtual machine platform;
acquiring the information acquisition task and a method class matched with the information acquisition task, and loading and instantiating the method class;
simulating a user operation event by using the instantiated method class, responding to the user operation event, and capturing or copying a text through a screen to obtain information through a keyboard and mouse event;
and updating the state of the information acquisition task according to the information acquired by the virtual machine platform.
According to one aspect of the invention, the target platform information includes runtime information, system version information, browser information, client information, network address information, software information, configuration information;
the runtime information comprises JRE runtime, netFramework runtime, python runtime and LUA runtime;
the system version information comprises Windows, linux and Kangkong kylin;
the browser information comprises IE, firefox, edge, chrome, 360, opear, tencent browser and Baidu browser;
the client information comprises the version of a client, and the client is a client installed on the virtual machine platform and used for acquiring the instruction in the polling database;
the network address information is configured IP address information or used proxy IP address information;
the software information is the name and version of the program installed on the virtual machine platform;
the configuration information comprises the core number, the core frequency, the maximum configuration memory, the used memory and the size of the hard disk configured for the virtual machine platform.
According to one aspect of the invention, a compatible version of a required client is acquired according to the target platform information, the target platform information is expanded when software or resources related to the information acquisition task run, a compatible version of a running environment required by the target platform information is acquired, all instances of the virtual machine platform are traversed, and an instance conforming to the target platform information is acquired.
According to one aspect of the invention, traversing the virtual machine platform according to the compatible version of the required operating environment, selecting the virtual machine platform with the operating environment with the compatible version, obtaining the identifier set of the virtual machine platform, and adding the identifier set to the database.
According to one aspect of the invention, all unexecuted tasks in a server are obtained, identifiers and frequencies of the virtual machine platforms which are matched with the unexecuted tasks are extracted, when the frequency of one identifier exceeds a threshold value, a copy of the virtual machine platform is created through a local VMAPI, a network environment of the copy is configured, and a new virtual machine platform is registered; the VMAPI is HyperVAPI or VMWare api.
According to one aspect of the invention, the method class is compiled bytecode or script, and when the method class is bytecode, the client on the virtual machine platform loads the compiled bytecode, creates an instance thereof, and executes information acquisition operation; when the method type is a script, calling a corresponding interpreter to execute the script; the scripts include LUA scripts, js scripts and python scripts.
According to one aspect of the invention, the simulated user operation event comprises movement of a mouse, depression of a mouse, clicking of a mouse, release of a mouse, depression of a button, release of a button, sending a keyboard message to a window, sending a mouse message to a window.
According to one aspect of the invention, the client executes the screen image of the information capture, judges whether the screen image range includes the elements of the information to be input, if not, throws out the exception and returns; if so, the mouse is moved to the position of the element and the characters included in the capture task are entered by simulating keyboard entry events.
According to one aspect of the invention, the client executes the screen image intercepted by the information, and judges whether the content of the screen image generates a change meeting the requirement of the information acquisition task after simulating the keyboard input event; changes include changes to the screenshot, updates to other elements within the region.
According to one aspect of the invention, the client executes the screen image of the information capture, judges whether the screen image range includes the elements of the information to be input, if not, throws out the exception and returns; and if so, moving the mouse to the position of the element, and acquiring the mouse operation included in the task by simulating the mouse input.
According to one aspect of the invention, the client executes the screen image intercepted by the information, and judges whether the content of the screen image generates a change meeting the requirement of the information acquisition task after simulating a mouse input event; changes include changes to the screenshot, updates to other elements within the region.
According to one aspect of the invention, the client simulates a mouse or a keyboard, the content rendered on the control of the screen image is selected, the selected content is copied to the clipboard, and the data in the clipboard is read and saved.
According to one aspect of the invention, the screen image captured by the client is copied to the clipboard, the screen image is identified, the content in the screen image is obtained, and the data is saved.
According to one aspect of the invention, the client records the active time of the client and the virtual machine platform when executing the information acquisition task.
According to one aspect of the invention, all unexecuted tasks in the server and all the virtual machine platforms matched with the unexecuted tasks are acquired, a difference set of the tasks is acquired, elements in the difference set are traversed, and if the idle time of the virtual machine platforms in the difference set exceeds a threshold value, the virtual machine is closed through a local VMAPI.
To achieve the above object, the present invention provides a system for obtaining non-invasive government data, comprising:
an operating environment acquisition module: acquiring target platform information of an execution platform according to an information acquisition task, and acquiring a compatible version of a required operating environment according to the target platform information;
the virtual machine platform establishing module: configuring a virtual machine platform with a compatible version according to the information acquisition task, and updating an identifier of the virtual machine platform and a polling database of the virtual machine platform;
the method class processing module comprises: acquiring the information acquisition task and a method class matched with the information acquisition task, and loading and instantiating the method class;
an information acquisition module: simulating a user operation event by using the instantiated method class, responding to the user operation event, and acquiring information by intercepting a screen or copying a text through a keyboard and mouse event;
and a task updating module: and updating the state of the information acquisition task according to the information acquired by the virtual machine platform.
Based on this, the beneficial effects of the invention are:
(1) The electronic government affair data is acquired in a non-invasive mode, so that error information possibly caused by other information acquisition modes is avoided;
(2) The acquisition process is carried out by running virtual machine resources, and the resources of the virtual machine can be dynamically expanded according to requirements, so that the method has better expansibility;
(3) The compatibility of the tasks is obtained by detecting the client side on the running virtual machine and the environment and information, so that the distribution and scheduling of the tasks can be realized;
(4) Dynamic expansion and contraction can be realized by detecting the occupation of resources, and resource scheduling is optimized.
Drawings
Figure 1 schematically shows a flow chart of a method of non-intrusive government data acquisition in accordance with the present invention;
figure 2 schematically shows a flow chart of a non-invasive government data acquisition system according to the present invention.
Detailed Description
The present invention will now be discussed with reference to exemplary embodiments, it being understood that the embodiments discussed are only for the purpose of enabling a person of ordinary skill in the art to better understand and thus implement the contents of the present invention, and do not imply any limitation on the scope of the present invention.
As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on" and the terms "one embodiment" and "an embodiment" are to be read as "at least one embodiment".
Fig. 1 is a flow chart schematically showing a non-invasive government data acquisition method according to the present invention, and as shown in fig. 1, the non-invasive government data acquisition method of the present invention comprises:
acquiring target platform information of an execution platform according to the information acquisition task, and acquiring a compatible version of a required operating environment according to the target platform information;
configuring a virtual machine platform with a compatible version according to the information acquisition task, and updating an identifier of the virtual machine platform and a polling database of the virtual machine platform;
acquiring an information acquisition task and a method class matched with the information acquisition task, and loading and instantiating the method class;
simulating a user operation event by using the instantiated method class, responding to the user operation event, and acquiring information by intercepting a screen or copying a text through a keyboard and mouse event;
and updating the state of the information acquisition task according to the information acquired by the virtual machine platform.
According to one embodiment of the invention, the target platform information comprises runtime information, system version information, browser information, client information, network address information, software information, configuration information;
the runtime information of (1) comprises JRE,. NetFramework, python runtime, LUA runtime;
the system version information comprises Windows, linux and winning symbol kylin;
the browser information comprises IE, firefox, edge, chrome, 360, opear, tencent browser and Baidu browser;
the client information comprises the version of the client, and the client is a client installed on the virtual machine platform and used for acquiring instructions in the polling database;
the network address information is configured IP address information or used proxy IP address information;
the software information is the name and version of the program installed on the virtual machine platform;
the configuration information comprises the core number, the core frequency, the maximum configuration memory, the used memory and the size of the hard disk configured for the virtual machine platform.
According to one embodiment of the invention, the compatible version of the required client is acquired according to the target platform information, the target platform information is expanded when the software related to the information acquisition task runs or the resources run, the compatible version of the running environment required by the target platform information is acquired, all the instances of the virtual machine platform are traversed, and the instances conforming to the target platform information are acquired.
According to one embodiment of the invention, the virtual machine platforms are traversed according to the compatible version of the required operating environment, the virtual machine platform containing the operating environment with the compatible version is selected, the identifier set of the virtual machine platform is obtained, and the identifier set is added into the database.
According to one embodiment of the invention, all unexecuted tasks in a server are acquired, identifiers and frequencies of virtual machine platforms which are matched with the unexecuted tasks are extracted, when the frequency of one identifier exceeds a threshold value, a copy of the virtual machine platform is created through a local VMAPI, a network environment of the copy is configured, and a new virtual machine platform is registered; the VMAPI is HyperVAPI or VMWare api.
According to one embodiment of the invention, the method class is compiled byte codes or scripts, and when the method class is byte codes, the client on the virtual machine platform loads the compiled byte codes, creates an instance of the byte codes and executes information acquisition operation; when the method type is a script, calling a corresponding interpreter to execute the script; the scripts include LUA scripts, js scripts and python scripts.
According to one embodiment of the invention, the simulated user operation event comprises movement of a mouse, pressing of the mouse, clicking of the mouse, releasing of the mouse, pressing of a key, releasing of the key, sending a keyboard message to the window, sending a mouse message to the window.
According to one embodiment of the invention, the client executes the screen image intercepted by the information, judges whether the screen image range includes the elements of the information to be input, if not, throws out the exception and returns; if so, the mouse is moved to the position of the element and the characters included in the capture task are entered by simulating keyboard entry events.
According to one embodiment of the invention, the client executes the screen image intercepted by the information, and judges whether the content of the screen image generates the change meeting the requirement of the information acquisition task after the input event of the simulation keyboard; changes include changes to the screenshot, updates to other elements within the region.
According to one embodiment of the invention, the client executes the screen image intercepted by the information, judges whether the screen image range includes the elements of the information to be input, if not, throws out the exception and returns; and if the collection task is included, moving the mouse to the position of the element, and inputting the mouse operation included in the collection task by simulating the mouse.
According to one embodiment of the invention, the client executes the screen image intercepted by the information, and judges whether the content of the screen image generates the change meeting the requirement of the information acquisition task after simulating the mouse input event; changes include changes to the screenshot, updates to other elements within the region.
According to one embodiment of the invention, the client simulates a mouse or a keyboard, so that the content rendered on the control of the screen image is selected, the selected content is copied to the clipboard, and the data in the clipboard is read and saved.
According to one embodiment of the invention, the screen image captured by the client is copied to the clipboard, the screen image is identified, the content in the screen image is obtained, and the data is saved.
According to one embodiment of the invention, the client records the active time of the client and the virtual machine platform when executing the information acquisition task.
According to one embodiment of the invention, all unexecuted tasks in the server and all virtual machine platforms matched with the unexecuted tasks are acquired, the difference set is acquired, elements in the difference set are traversed, and if the idle time of the virtual machine platforms in the difference set exceeds a threshold value, the virtual machines are closed through the local VMAPI.
According to one embodiment of the invention, the data source 1 is information disclosed by a conventional government website, the content of a page can be obtained by accessing an address, the page can be accessed by clicking a link, and the page is compatible with a common browser;
the data source 2 is information disclosed by a conventional government website, page information cannot be accessed under the condition of no login, the content of a page can be obtained by accessing an address after login, the page can be accessed by clicking a link, and the page is compatible with a common browser;
the data source 3 is a website compatible with the old version of IE, belongs to the condition that an old version browser (lower than IE 8) is not compatible with a mainstream browser, and needs to be configured with an operating system consistent with the compatible browser to obtain correct display content;
the data source 4 is information disclosed by a conventional government website, page information cannot be accessed under the condition of no login, the content of a page can be obtained through an access address after login, the page can be accessed by clicking a link, but the target content does not belong to conventional character content, but is image content, and the page is compatible with a common browser;
the data source 5 is information disclosed by a conventional government website, page information cannot be accessed under the condition of no login, the content of a page can be obtained by accessing an address after login, the page can be accessed by clicking a link, but the target content does not belong to conventional text content, but is contained in the link of the page, and the page is compatible with a common browser;
the data source 6 is information disclosed by the government department App, page information cannot be accessed under the condition of no login, the content of the page can be obtained through an access address after login, the page can be accessed by clicking a link, but the target content does not belong to conventional text content, and screen capture post-processing is required.
The multiple virtual machine platforms are used for operating the multiple RPA robots, so that task allocation of the virtual machine platforms is carried out through the database, information acquisition is realized by adding, deleting and changing tasks of the RPA robots in the database, and meanwhile, different functions of one server can be realized by locally configuring the multiple virtual machine platforms;
where screenshots apply to data sources 4-6 and keyboard and mouse events apply to data sources 1-3, because the keyboard and mouse events can capture text.
According to one embodiment of the invention, the mirror images of different systems can be configured on the host, and different systems, clients and operation time are configured in the mirror images, so that different information can be acquired. And if the real-time resources of the virtual machine platform cannot meet the requirements, enhancing the information acquisition capability by creating a new virtual machine platform.
According to one embodiment of the invention, the task configuration of the virtual machine platform is determined through the information acquisition task and the task requirement in the database, the virtual machine platform meeting the configuration requirement is determined according to the task configuration of the virtual machine platform, the predicted completion time of the task is determined according to the meeting of the virtual machine platform meeting the configuration requirement, and if the task completion time is greater than the information acquisition threshold (such as 24 hours), the virtual machine platform meeting the task configuration is created.
Furthermore, to achieve the above objects, the present invention provides a system for non-invasive government affairs data acquisition, fig. 2 is a flow chart schematically showing a system for non-invasive government affairs data acquisition according to the present invention, and as shown in fig. 2, the system for non-invasive government affairs data acquisition according to the present invention comprises:
an operating environment acquisition module: acquiring target platform information of an execution platform according to the information acquisition task, and acquiring a compatible version of a required operating environment according to the target platform information;
the virtual machine platform establishing module: configuring a virtual machine platform with a compatible version according to the information acquisition task, and updating an identifier of the virtual machine platform and a polling database of the virtual machine platform;
a method class processing module: acquiring an information acquisition task and a method class matched with the information acquisition task, and loading and instantiating the method class;
an information acquisition module: simulating a user operation event by using the instantiated method class, responding to the user operation event, and capturing or copying a text through a screen to obtain information through a keyboard and mouse event;
and a task updating module: and updating the state of the information acquisition task according to the information acquired by the virtual machine platform.
According to one embodiment of the present invention, the target platform information includes runtime information, system version information, browser information, client information, network address information, software information, configuration information;
the runtime information of (1) comprises JRE,. NetFramework, python runtime, LUA runtime;
the system version information comprises Windows, linux and winning symbol kylin;
the browser information comprises IE, firefox, edge, chrome, 360, opear, tencent browser and Baidu browser;
the client information comprises the version of the client, and the client is a client installed on the virtual machine platform and used for acquiring instructions in the polling database;
the network address information is configured IP address information or used proxy IP address information;
the software information is the name and version of the program installed on the virtual machine platform;
the configuration information comprises the core number, the core frequency, the maximum configuration memory, the used memory and the size of the hard disk configured for the virtual machine platform.
According to one embodiment of the invention, the compatible version of the required client is acquired according to the target platform information, the target platform information is expanded when the software or resource related to the information acquisition task runs, the compatible version of the running environment required by the target platform information is acquired, and the examples conforming to the target platform information are acquired by traversing all the examples of the virtual machine platform.
According to one embodiment of the invention, the virtual machine platforms are traversed according to the compatible version of the required operating environment, the virtual machine platform containing the operating environment with the compatible version is selected, the identifier set of the virtual machine platform is obtained, and the identifier set is added into the database.
According to one embodiment of the invention, all unexecuted tasks in a server are acquired, identifiers and frequencies of virtual machine platforms which are matched with the unexecuted tasks are extracted, when the frequency of one identifier exceeds a threshold value, a copy of the virtual machine platform is created through a local VMAPI, a network environment of the copy is configured, and a new virtual machine platform is registered; the VMAPI is HyperVAPI or VMWare api.
According to one embodiment of the invention, the method class is compiled byte codes or scripts, and when the method class is byte codes, the client on the virtual machine platform loads the compiled byte codes, creates an instance of the byte codes and executes information acquisition operation; when the method type is a script, calling a corresponding interpreter and executing the script; the scripts include LUA scripts, js scripts and python scripts.
According to one embodiment of the invention, the simulated user operation event comprises movement of a mouse, depression of a mouse, clicking of a mouse, release of a mouse, depression of a key, release of a key, sending of a keyboard message to a window, sending of a mouse message to a window.
According to one embodiment of the invention, the client executes the screen image intercepted by the information, judges whether the screen image range includes the elements of the information to be input, if not, throws out the exception and returns; if so, the mouse is moved to the position of the element and the characters included in the capture task are entered by simulating keyboard entry events.
According to one embodiment of the invention, the client executes the screen image intercepted by the information, and judges whether the content of the screen image generates a change meeting the requirement of the information acquisition task after simulating the keyboard input event; changes include changes to the screenshot, updates to other elements within the region.
According to one embodiment of the invention, the client executes the screen image captured by the information, judges whether the screen image range includes elements of the information to be input, and throws out the exception and returns if the screen image range does not include the elements of the information to be input; and if the collection task is included, moving the mouse to the position of the element, and inputting the mouse operation included in the collection task by simulating the mouse.
According to one embodiment of the invention, the client executes the screen image intercepted by the information, and judges whether the content of the screen image generates the change meeting the requirement of the information acquisition task after simulating the mouse input event; changes include changes to the screenshot, updates to other elements within the region.
According to one embodiment of the invention, the client simulates a mouse or a keyboard, so that the content rendered on the control of the screen image is selected, the selected content is copied to the clipboard, and the data in the clipboard is read and saved.
According to one embodiment of the invention, the screen image captured by the client is copied to the clipboard, the screen image is identified, the content in the screen image is obtained, and the data is saved.
According to one embodiment of the invention, the client records the active time of the client and the virtual machine platform when executing the information acquisition task.
According to one embodiment of the invention, all unexecuted tasks in the server and all virtual machine platforms matched with the unexecuted tasks are acquired, the difference set is acquired, elements in the difference set are traversed, and if the idle time of the virtual machine platforms in the difference set exceeds a threshold value, the virtual machines are closed through the local VMAPI.
According to one embodiment of the invention, the data source 1 is information disclosed by a conventional government website, the content of a page can be obtained by accessing an address, the page can be accessed by clicking a link, and the page is compatible with a common browser;
the data source 2 is information disclosed by a conventional government website, page information cannot be accessed under the condition of no login, the content of a page can be obtained by accessing an address after login, the page can be accessed by clicking a link, and the page is compatible with a common browser;
the data source 3 is a website compatible with the old version of IE, belongs to the fact that the old version of browser (lower than IE 8) is not compatible with the mainstream browser, and needs to be configured with an operating system consistent with the compatible browser to obtain correct display content;
the data source 4 is information disclosed by a conventional government website, page information cannot be accessed under the condition of no login, the content of a page can be obtained by accessing an address after login, the page can be accessed by clicking a link, but the target content does not belong to conventional text content, but is image content, and the page is compatible with a common browser;
the data source 5 is information disclosed by a conventional government website, page information cannot be accessed under the condition of no login, the content of a page can be obtained by accessing an address after login, the page can be accessed by clicking a link, but the target content does not belong to conventional text content, but is contained in the link of the page, and the page is compatible with a common browser;
the data source 6 is information disclosed by the government department App, page information cannot be accessed under the condition of no login, the content of the page can be obtained through an access address after login, the page can be accessed by clicking a link, but the target content does not belong to conventional text content, and screen capture post-processing is required.
The multiple virtual machine platforms are used for operating the multiple RPA robots, so that task allocation of the virtual machine platforms is carried out through the database, information acquisition is realized by adding, deleting and changing tasks of the RPA robots in the database, and meanwhile, different functions of one server can be realized by locally configuring the multiple virtual machine platforms;
where screenshots apply to data sources 4-6 and keyboard and mouse events apply to data sources 1-3, because the keyboard and mouse events can capture text.
According to one embodiment of the invention, the mirror images of different systems can be configured on the host, and different systems, clients and operation time are configured in the mirror images, so that different information can be acquired. And if the real-time resources of the virtual machine platform cannot meet the requirements, enhancing the information acquisition capability by creating a new virtual machine platform.
According to one embodiment of the invention, the task configuration of the virtual machine platform is determined through the information acquisition task and the task requirement in the database, the virtual machine platform meeting the configuration requirement is determined according to the task configuration of the virtual machine platform, the predicted completion time of the task is determined according to the meeting of the virtual machine platform meeting the configuration requirement, and if the task completion time is greater than the information acquisition threshold (such as 24 hours), the virtual machine platform meeting the task configuration is created.
Based on the above, the beneficial effects of the invention are that the electronic government affair data is obtained in a non-invasive way, thereby avoiding the error information (for example, the robot is judged to receive error messages of 400 and 500 types) possibly caused by other information obtaining ways; the acquisition process is carried out by running virtual machine resources, the resources of the virtual machines can be dynamically expanded according to requirements, and the virtual machines have better expansibility, and the expanded virtual machines have direct information processing capacity, so that the configuration difficulty is reduced; the compatibility of the client on the running virtual machine, the environment and the information acquisition task is detected, so that the task allocation scheduling can be realized, the task allocation scheduling is carried out through a database, and the irrationality of the task allocation is avoided; dynamic expansion and contraction can be achieved by detecting the occupation of resources, resource scheduling is optimized, the number of virtual machines can be increased when resource allocation is insufficient, and the virtual machines can be closed or released when the virtual machines are idle.
Those of ordinary skill in the art will appreciate that the modules and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and devices may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one position, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, each functional module in the embodiments of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method for transmitting/receiving the power saving signal according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk, and various media capable of storing program codes.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention according to the present application is not limited to the specific combination of the above-mentioned features, but also covers other embodiments where any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
It should be understood that the order of execution of the steps in the summary of the invention and the embodiments of the present invention does not absolutely imply any order of execution, and the order of execution of the steps should be determined by their functions and inherent logic, and should not be construed as limiting the process of the embodiments of the present invention.

Claims (16)

1. A method of non-intrusive government data acquisition, comprising:
acquiring target platform information of an execution platform according to an information acquisition task, and acquiring a compatible version of a required operating environment according to the target platform information;
configuring a virtual machine platform with a compatible version according to the information acquisition task, and updating an identifier of the virtual machine platform and a polling database of the virtual machine platform;
acquiring the information acquisition task and a method class matched with the information acquisition task, and loading and instantiating the method class;
simulating a user operation event by using the instantiated method class, responding to the user operation event, and acquiring information by intercepting a screen or copying a text through a keyboard and mouse event;
and updating the state of the information acquisition task according to the information acquired by the virtual machine platform.
2. The method for acquiring non-invasive government data according to claim 1, wherein the target platform information comprises runtime information, system version information, browser information, client information, network address information, software information, configuration information;
the runtime information comprises JRE, netFramework, python runtime and LUA runtime;
the system version information comprises Windows, linux and Kangkong kylin;
the browser information comprises IE, firefox, edge, chrome, 360, opear, tencent browser and Baidu browser;
the client information comprises the version of a client, and the client is a client installed on the virtual machine platform and used for acquiring the instruction in the polling database;
the network address information is configured IP address information or used proxy IP address information;
the software information is the name and version of the program installed on the virtual machine platform;
the configuration information comprises the core number, the core frequency, the maximum configuration memory, the used memory and the size of a hard disk configured for the virtual machine platform.
3. A non-invasive government data acquisition method according to claim 2, wherein,
and acquiring a compatible version of a required client according to the target platform information, expanding the target platform information when software or resources related to the information acquisition task run, acquiring a compatible version of a running environment required by the target platform information, traversing all instances of the virtual machine platform, and acquiring an instance conforming to the target platform information.
4. A method as claimed in claim 3, wherein the virtual machine platforms are traversed according to compatible versions of the required operating environments, the virtual machine platform with the operating environment of the compatible version is selected, and the identifier set of the virtual machine platform is obtained and added to the database.
5. A non-invasive government data acquisition method according to claim 4, wherein all unexecuted tasks in the server are acquired, the identifier and frequency of the virtual machine platform matching the unexecuted tasks are extracted, when the frequency of one identifier exceeds a threshold value, a copy of the virtual machine platform is created through a local VMAPI, the network environment thereof is configured, and a new virtual machine platform is registered; the VMAPI is HyperVAPI or VMWare api.
6. The method for acquiring non-invasive government data according to claim 5, wherein the method class is compiled bytecode or script, and when the method class is bytecode, the client on the virtual machine platform loads the compiled bytecode, creates an instance thereof, and performs an information acquisition operation; when the method type is a script, calling a corresponding interpreter and executing the script; the scripts include LUA scripts, js scripts and python scripts.
7. A non-invasive government data acquisition method according to claim 6, wherein the simulated user operation event includes movement of a mouse, depression of a mouse, clicking of a mouse, release of a mouse, depression of a key, release of a key, sending a keyboard message to a window, sending a mouse message to a window.
8. A non-invasive government affair data obtaining method according to claim 7, wherein,
the client executes the screen image intercepted by the information, judges whether the screen image range comprises elements of the information to be input, and throws out the exception and returns the exception if the screen image range does not comprise the elements of the information to be input; if so, the mouse is moved to the position of the element and the characters included in the capture task are entered by simulating keyboard entry events.
9. The method for acquiring non-invasive government affair data according to claim 8, wherein the client executes the screen image for information capturing to determine whether the content of the screen image changes in accordance with the requirement of the information acquisition task after simulating a keyboard input event; changes include changes to the screenshot, updates to other elements within the region.
10. A non-invasive government data acquisition method according to claim 9, wherein,
the client executes the screen image intercepted by the information, judges whether the screen image range comprises elements of the information to be input, and if not, throws out the exception and returns; and if so, moving the mouse to the position of the element, and acquiring the mouse operation included in the task by simulating the mouse input.
11. A non-invasive government data acquisition method according to claim 10, wherein,
the client executes the screen image intercepted by the information and judges whether the content of the screen image generates the change meeting the requirement of the information acquisition task after simulating the mouse input event; changes include changes to the screenshot, updates to other elements within the region.
12. The method according to claim 11, wherein the client simulates a mouse or a keyboard, selects the content rendered on the screen image control, copies the selected content to the clipboard, and reads and saves the data in the clipboard.
13. The method as claimed in claim 12, wherein the screen image captured by the client is copied to a clipboard, the screen image is identified, the content of the screen image is obtained, and the data is saved.
14. A non-invasive government data acquisition method according to claim 13, wherein,
and the client records the active time of the client and the virtual machine platform when executing the information acquisition task.
15. A non-invasive government data acquisition method according to claim 14, wherein,
acquiring all unexecuted tasks in the server and acquiring all the virtual machine platforms matched with the unexecuted tasks, acquiring a difference set of the tasks, traversing elements in the difference set, and closing the virtual machine through a local VMAPI (virtual machine application program interface) if the idle time of the virtual machine platforms in the difference set exceeds a threshold value.
16. A non-invasive government affair data acquisition system is characterized in that,
an operating environment acquisition module: acquiring target platform information of an execution platform according to an information acquisition task, and acquiring a compatible version of a required operating environment according to the target platform information;
the virtual machine platform establishing module: configuring a virtual machine platform with a compatible version according to the information acquisition task, and updating an identifier of the virtual machine platform and a polling database of the virtual machine platform;
a method class processing module: acquiring the information acquisition task and a method class matched with the information acquisition task, and loading and instantiating the method class;
an information acquisition module: simulating a user operation event by using the instantiated method class, responding to the user operation event, and capturing or copying a text through a screen to obtain information through a keyboard and mouse event;
and a task updating module: and updating the state of the information acquisition task according to the information acquired by the virtual machine platform.
CN202211219122.0A 2022-10-08 2022-10-08 Non-invasive government data acquisition method Pending CN115292647A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211219122.0A CN115292647A (en) 2022-10-08 2022-10-08 Non-invasive government data acquisition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211219122.0A CN115292647A (en) 2022-10-08 2022-10-08 Non-invasive government data acquisition method

Publications (1)

Publication Number Publication Date
CN115292647A true CN115292647A (en) 2022-11-04

Family

ID=83834882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211219122.0A Pending CN115292647A (en) 2022-10-08 2022-10-08 Non-invasive government data acquisition method

Country Status (1)

Country Link
CN (1) CN115292647A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116257005A (en) * 2023-01-25 2023-06-13 杭州银湖冠天智能科技有限公司 System for non-invasive access CIM control of island equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999007109A1 (en) * 1997-07-31 1999-02-11 Crosskeys Systems Corporation Third party management platforms integration
CN106095918A (en) * 2016-06-06 2016-11-09 山东科技大学 A kind of acquisition methods of the protected exponent data of network based on OCR technique
CN110073301A (en) * 2017-08-02 2019-07-30 强力物联网投资组合2016有限公司 The detection method and system under data collection environment in industrial Internet of Things with large data sets
CN112269627A (en) * 2020-09-21 2021-01-26 西安万像电子科技有限公司 Data processing method, device and system
CN112800311A (en) * 2021-02-05 2021-05-14 厦门市美亚柏科信息股份有限公司 Browser page data acquisition method, terminal device and storage medium
US20210389846A1 (en) * 2020-06-10 2021-12-16 Microsoft Technology Licensing, Llc Systems and methods for viewing incompatible web pages via remote browser instances
CN114693262A (en) * 2022-03-29 2022-07-01 李林 Smart city information grid operating system
CN114819889A (en) * 2022-04-13 2022-07-29 北京来也网络科技有限公司 RPA and AI combined e-commerce data acquisition method and device and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999007109A1 (en) * 1997-07-31 1999-02-11 Crosskeys Systems Corporation Third party management platforms integration
CN106095918A (en) * 2016-06-06 2016-11-09 山东科技大学 A kind of acquisition methods of the protected exponent data of network based on OCR technique
CN110073301A (en) * 2017-08-02 2019-07-30 强力物联网投资组合2016有限公司 The detection method and system under data collection environment in industrial Internet of Things with large data sets
US20210389846A1 (en) * 2020-06-10 2021-12-16 Microsoft Technology Licensing, Llc Systems and methods for viewing incompatible web pages via remote browser instances
CN112269627A (en) * 2020-09-21 2021-01-26 西安万像电子科技有限公司 Data processing method, device and system
CN112800311A (en) * 2021-02-05 2021-05-14 厦门市美亚柏科信息股份有限公司 Browser page data acquisition method, terminal device and storage medium
CN114693262A (en) * 2022-03-29 2022-07-01 李林 Smart city information grid operating system
CN114819889A (en) * 2022-04-13 2022-07-29 北京来也网络科技有限公司 RPA and AI combined e-commerce data acquisition method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李凤生等: "淮委综合应用门户整合的设计与实现", 《水利信息化》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116257005A (en) * 2023-01-25 2023-06-13 杭州银湖冠天智能科技有限公司 System for non-invasive access CIM control of island equipment
CN116257005B (en) * 2023-01-25 2023-10-10 杭州银湖冠天智能科技有限公司 System for non-invasive access CIM control of island equipment

Similar Documents

Publication Publication Date Title
CN1318932C (en) Method and apparatus for the automatic determination of potentially worm-like behaviour of a program
Balci Requirements for model development environments
US8719784B2 (en) Assigning runtime artifacts to software components
CN108287708B (en) Data processing method and device, server and computer readable storage medium
CN108829371B (en) Interface control method and device, storage medium and electronic equipment
US20210026614A1 (en) Container orchestration framework
CN113885849B (en) Application development method and device based on industrial internet platform and terminal equipment
CN115292647A (en) Non-invasive government data acquisition method
US6289503B1 (en) System and method for trace verification
CN116893945A (en) Operation and maintenance cloud platform service early warning method, system and device
CN108984158A (en) Computing device and the method executed in the computing device
CN112416769A (en) Automatic test method and system for simulating user operation in batches under cloud desktop scene
US9823998B2 (en) Trace recovery via statistical reasoning
CN113434217B (en) Vulnerability scanning method, vulnerability scanning device, computer equipment and medium
CN113495498B (en) Simulation method, simulator, device and medium for hardware device
US20030182596A1 (en) Method and system for isolating exception related errors in java JVM
CN110327626B (en) Virtual server creation method and device
CN113220586A (en) Automatic interface pressure test execution method, device and system
CN111190637B (en) Version file release management method, device and system
CN113703339A (en) Automatic driving simulation method, device, equipment and storage medium
CN113031964A (en) Management method, device, equipment and storage medium for big data application
JP2003150405A (en) Debug method for equipment incorporating program
CN111414270A (en) Exception handling method and device
CN116132175B (en) Event-driven network engine-based remote back door detection method
KR102567773B1 (en) Log information extraction device and method in combat system system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination