CN113807829A

CN113807829A - Information management method and system based on deep reinforcement learning model

Info

Publication number: CN113807829A
Application number: CN202111372958.XA
Authority: CN
Inventors: 刘涛; 郑维; 王勇飞; 王义勇
Original assignee: Guoneng Daduhe Big Data Service Co ltd
Current assignee: Guoneng Daduhe Big Data Service Co ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2021-12-17

Abstract

The embodiment of the application discloses an information management method and system based on a deep reinforcement learning model, which belong to the technical field of data management, wherein the information management system based on the deep reinforcement learning model comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring employee social security information of a current period and employee social security information of a previous period; the state comparison module is used for comparing the employee social security information of the current period with the employee social security information of the previous period and judging whether the employee social security information is changed; and the social security adjusting module is used for adjusting the staff social security payment state based on the staff social security information of the current period through the deep reinforcement learning model when the state comparison module judges that the staff social security information changes.

Description

Information management method and system based on deep reinforcement learning model

Technical Field

The invention mainly relates to the technical field of data management, in particular to an information management method and system based on a deep reinforcement learning model.

Background

Social insurance payment is a work which is required to be carried out by each working unit, and has the characteristics of competiveness, difference of national social insurance policies and complexity. Social security relation is adjusted after staff work is mobilized, in the prior art, social security data are mainly checked and counted through manpower, and the efficiency of checking information is relatively low.

Therefore, it is desirable to provide an information management method and system based on a deep reinforcement learning model, which are used for adjusting the social security of the employee in time.

Disclosure of Invention

One of the embodiments of the present specification provides an information management system based on a deep reinforcement learning model, including: the data acquisition module is used for acquiring employee social security information of the current period and employee social security information of the previous period; the state comparison module is used for comparing the employee social security information of the current period with the employee social security information of the previous period and judging whether the employee social security information is changed; and the social security adjusting module is used for adjusting the staff social security payment state based on the staff social security information of the current period through a deep reinforcement learning model when the state comparison module judges that the staff social security information is changed.

In some embodiments, the employee social security information includes employee related information, including a city where an affiliated staff unit of the employee is located, an affiliated post, wages, and a working state; the employee social security information further comprises social security payment standard information of a plurality of regions.

In some embodiments, the data acquisition module acquires social security payment standard information of the plurality of regions of the current period based on a web crawler.

In some embodiments, the data obtaining module obtains social security payment standard information of the plurality of regions of the current period based on a web crawler, including: acquiring a parent URL of at least one initial webpage issued in the current period; extracting at least one child URL from a parent URL of the at least one initial webpage; determining at least one target sub URL from the at least one sub URL based on the publishing time of the at least one sub URL and the correlation degree of the webpage content of the at least one sub URL and a preset theme; and performing data capture on the parent URL of the at least one initial webpage and the at least one target child URL to acquire social security payment standard information of the plurality of regions in the current period.

In some embodiments, the data obtaining module obtains social security payment standard information of the plurality of regions of the current period based on a web crawler, further including: and before data capture is carried out on the parent URL of the at least one initial webpage and the at least one target child URL, carrying out duplicate removal on the parent URL of the at least one initial webpage and the at least one target child URL.

In some embodiments, the comparing module compares the employee social security information of the current period with the employee social security information of the previous period, and determines whether the employee social security information is changed, including: judging whether the city, the post, the wage and the working state of the belonged personnel unit of the employee in the current period are consistent with the city, the post, the wage and the working state of the belonged personnel unit of the employee in the previous period; if not, judging that the employee social security information is changed; if so, judging whether the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the current period is consistent with the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the previous period; and if the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the current period is inconsistent with the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the previous period, judging that the employee social security information is changed.

In some embodiments, the deep reinforcement learning model comprises at least one hidden layer, wherein the hidden layer is a long-short term memory network; and the output layer of the deep reinforcement learning model is a fully-connected neural network.

In some embodiments, the social security adjustment module adjusts the employee social security payment state based on the employee social security information of the current period through a deep reinforcement learning model, including; taking the employee social security information of the current period as the input of the deep reinforcement learning model; and adjusting the social security payment state of the staff through the optimal action value function.

In some embodiments, the system further comprises a data pushing module and at least one social security check terminal; the data pushing module is used for sending the staff social security payment state of the last period and the adjusted staff social security payment state to the at least one social security checking terminal after the social security adjusting module adjusts the staff social security payment state.

One embodiment of the present disclosure provides an information management system based on a deep reinforcement learning model.

Drawings

The present application will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a schematic diagram of an application scenario of an information management system based on a deep reinforcement learning model according to some embodiments of the present application;

FIG. 2 is an exemplary block diagram of a computing device shown in accordance with some embodiments of the present application;

FIG. 3 is an exemplary block diagram of an information management system based on a deep reinforcement learning model according to some embodiments of the present application;

FIG. 4 is an exemplary flow chart of a method for information management based on a deep reinforcement learning model according to some embodiments of the present application.

In the figure, 100, an information management system based on a deep reinforcement learning model; 110. a processing device; 120. a network; 130. a social security check terminal; 140. a storage device; 200. a computing device; 210. a processor; 220. a read-only memory; 230. a random access memory; 240. a communication port; 250. an input/output interface; 260. and a hard disk.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only examples or embodiments of the application, from which the application can also be applied to other similar scenarios without inventive effort for a person skilled in the art. It is understood that these exemplary embodiments are given solely to enable those skilled in the relevant art to better understand and implement the present invention, and are not intended to limit the scope of the invention in any way. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "apparatus", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Although various references are made herein to certain modules or units in a system according to embodiments of the present application, any number of different modules or units may be used and run on a client and/or server. The modules are merely illustrative and different aspects of the systems and methods may use different modules.

Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

Fig. 1 is a schematic view of an application scenario of an information management apparatus based on a deep reinforcement learning model according to some embodiments of the present application.

As shown in fig. 1, the information management apparatus 100 based on the deep reinforcement learning model may include a processing device 110, a network 120, a social security check terminal 130, and a storage device 140.

In some embodiments, the deep reinforcement learning model-based information management system 100 may provide assistance for the human entity to pay for employee social security. In some embodiments, the deep reinforcement learning model-based information management system 100 may adjust the employee social security payment status in time when the employee social security information changes. It should be noted that the information management system 100 based on the deep reinforcement learning model may also be applied to other devices, scenarios and applications that require human resource management, and is not limited herein, and any device, scenario and/or application that may use the information management method based on the deep reinforcement learning model included in the present application is within the scope of the present application.

In some embodiments, the processing device 110 may be used to process information and/or data related to adjusting employee social security. For example, the processing device 110 may obtain the employee social security information of the current period and the employee social security information of the previous period. For example, the processing device 110 may be configured to compare the employee social security information of the current period with the employee social security information of the previous period, and determine whether the employee social security information is changed. For example, the processing device 110 may adjust the employee social security payment status based on the employee social security information of the current period through the deep reinforcement learning model when the status comparison module determines that the employee social security information is changed. Further description of the processing device 110 may be found in other portions of the present application. For example, fig. 2, 3 and their description.

In some embodiments, the processing device 110 may be regional or remote. For example, the processing device 110 may access information and/or material stored in the social security check terminal 130 and the storage device 140 via the network 120. In some embodiments, the processing device 110 may directly interface with the social security check terminal 130 and the storage device 140 to access information and/or material stored therein. In some embodiments, the processing device 110 may execute on a cloud platform. For example, the cloud platform may include one or any combination of a private cloud, a public cloud, a hybrid cloud, a community cloud, a decentralized cloud, an internal cloud, and the like.

In some embodiments, the processing device 110 may include a processor 210, and the processor 210 may include one or more sub-processors (e.g., a single core processing device or a multi-core processing device). Merely by way of example, a processor may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an Application Specific Instruction Processor (ASIP), a Graphics Processor (GPU), a Physical Processor (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a programmable logic circuit (PLD), a controller, a microcontroller unit, a Reduced Instruction Set Computer (RISC), a microprocessor, and the like or any combination thereof.

The network 120 can facilitate the exchange of data and/or information in the deep reinforcement learning model-based information management system 100. In some embodiments, one or more components (e.g., the processing device 110, the social security check terminal 130, and the storage device 140) in the deep reinforcement learning model-based information management system 100 may send data and/or information to other components in the deep reinforcement learning model-based information management system 100 via the network 120. For example, the employee social security information of the last period stored by the storage device 140 may be transmitted to the processing device 110 via the network 120. For another example, the processing device 110 may transmit the adjusted employee social security payment status to the social security check terminal 130 of the target employee through the network 120. In some embodiments, the network 120 may be any type of wired or wireless network. For example, network 120 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network, a Near Field Communication (NFC) network, the like, or any combination thereof. In some embodiments, network 120 may include one or more network access points. For example, the network 120 may include wired or wireless network access points, such as base stations and/or Internet switching points, through which one or more components of the deep reinforcement learning model-based information management system 100 may connect to the network 120 to exchange data and/or information.

In some embodiments, the social security check terminal 130 may obtain information or data in the deep reinforcement learning model-based information management system 100. In some embodiments, the person managing the employee social security payment may obtain the adjusted employee social security payment status through the social security check terminal 130. In some embodiments, the social security check terminal 130 may include one or any combination of a mobile device, a tablet, a laptop, and the like. In some embodiments, the mobile device may include a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, and the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, smart glasses, a smart helmet, a smart watch, a smart backpack, a smart handle, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smart phone, a Personal Digital Assistant (PDA), a gaming device, a navigation device, a POS device, and the like, or any combination thereof.

In some embodiments, the storage device 140 may be connected to the network 120 to enable communication with one or more components of the deep reinforcement learning model-based information management system 100 (e.g., the processing device 110, the social security check terminal 130, etc.). One or more components of the deep reinforcement learning model-based information management system 100 may access data or instructions stored in the storage device 140 via the network 120. In some embodiments, the storage device 140 may be directly connected to or in communication with one or more components (e.g., the processing device 110, the social security check terminal 130) in the deep reinforcement learning model-based information management system 100. In some embodiments, the storage device 140 may be part of the processing device 110. In some embodiments, the processing device 110 may also be located in the social security check terminal 130.

It should be noted that the foregoing description is provided for illustrative purposes only, and is not intended to limit the scope of the present application. Many variations and modifications will occur to those skilled in the art in light of the teachings herein. The features, structures, methods, and other features of the exemplary embodiments described herein may be combined in various ways to obtain additional and/or alternative exemplary embodiments. For example, the storage device 140 may be a data storage device comprising a cloud computing platform, such as a public cloud, a private cloud, a community and hybrid cloud, and the like. However, such changes and modifications do not depart from the scope of the present application.

FIG. 2 is an exemplary block diagram of a computing device shown in accordance with some embodiments of the present application.

In some embodiments, the processing device 110 and/or the social security check terminal 130 may be implemented on the computing device 200. For example, processing device 110 may implement and execute the get work tasks disclosed herein on computing device 200.

As shown in fig. 2, computing device 200 may include a processor 210, a read only memory 220, a random access memory 230, a communication port 240, an input/output interface 250, and a hard disk 260.

The processor 210 may execute the computing instructions (program code) and perform the functions of the deep reinforcement learning model-based information management system 100 described herein. The computing instructions may include programs, objects, components, data structures, procedures, modules, and functions (which refer to specific functions described herein). For example, the processor 210 may process a plurality of sample employee information obtained from the storage device 140 of the deep reinforcement learning model-based information management system 100. In some embodiments, processor 210 may include microcontrollers, microprocessors, Reduced Instruction Set Computers (RISC), Application Specific Integrated Circuits (ASIC), application specific instruction set processors (ASIP), Central Processing Units (CPU), Graphics Processing Units (GPU), Physical Processing Units (PPU), microcontroller units, Digital Signal Processors (DSP), Field Programmable Gate Array (FPGA), Advanced RISC Machines (ARM), programmable logic devices, any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustration only, the computing device 200 in fig. 2 depicts only one processor, but it should be noted that the computing device 200 in the present application may also include multiple processors.

The memory (e.g., Read Only Memory (ROM) 220, Random Access Memory (RAM) 230, hard disk 260, etc.) of computing device 200 may store data/information obtained from any other component of deep reinforcement learning model-based information management system 100. For example, employee information of a plurality of sample employees, etc. is obtained from the storage device 140. As another example, the storage device 140 stores instructions that determine a target work task from a plurality of work tasks. Exemplary ROMs may include Mask ROM (MROM), Programmable ROM (PROM), erasable programmable ROM (PEROM), Electrically Erasable Programmable ROM (EEPROM), compact disk ROM (CD-ROM), digital versatile disk ROM, and the like. Exemplary RAM may include Dynamic RAM (DRAM), double-data-rate synchronous dynamic RAM (DDR SDRAM), Static RAM (SRAM), thyristor RAM (T-RAM), zero-capacitance (Z-RAM), and the like.

The input/output interface 250 may be used to input or output signals, data, or information. In some embodiments, the input/output interface 250 may interface employees with the information management system 100 based on the deep reinforcement learning model. For example, the employee receives information regarding the target work task sent by the processing device 110 via the input/output interface 250. Also for example, the employee sends employee feedback to the processing device 110 via the input/output interface 250. In some embodiments, input/output interface 250 may include an input device and an output device. Exemplary input devices may include a keyboard, mouse, touch screen, microphone, and the like, or any combination thereof. Exemplary output devices may include a display device, speakers, printer, projector, etc., or any combination thereof. Exemplary display devices may include Liquid Crystal Displays (LCDs), Light Emitting Diode (LED) based displays, flat panel displays, curved displays, television equipment, Cathode Ray Tubes (CRTs), and the like, or any combination thereof. The communication port 240 may be connected to a network for data communication. The connection may be a wired connection, a wireless connection, or a combination of both. The wired connection may include an electrical cable, an optical cable, or a telephone line, among others, or any combination thereof. The wireless connection may include bluetooth, Wi-Fi, WiMax, WLAN, ZigBee, mobile networks (e.g., 3G, 4G, or 5G, etc.), and the like, or any combination thereof. In some embodiments, the communication port 240 may be a standardized port, such as RS232, RS485, and the like. In some embodiments, the communication port 240 may be a specially designed port.

Computing device 200 depicts only one central processor and/or processor for purposes of illustration only. However, it should be noted that the computing device 200 in the present application may include a plurality of central processing units and/or processors, and thus the operations and/or methods described in the present application implemented by one central processing unit and/or processor may also be implemented by a plurality of central processing units and/or processors, collectively or independently. For example, a central processor and/or processors of computing device 200 may perform steps a and B. In another example, steps a and B may also be performed by two different central processors and/or processors in computing device 200, either in combination or separately (e.g., a first processor performing step a and a second processor performing step B, or both a first and second processor performing steps a and B together).

FIG. 3 is an exemplary block diagram of an information management system 100 based on a deep reinforcement learning model according to some embodiments of the present application.

As shown in fig. 3, an information management system 100 based on a deep reinforcement learning model may include a data obtaining module, a state comparison module, a social security adjustment module, and a data pushing module. In some embodiments, the data acquisition module, the status comparison module, the social security adjustment module, and the data push module may be implemented on the processing device 110 or the computing device 200.

In some embodiments, the data acquisition module may be configured to acquire the employee social security information of the current period and the employee social security information of the previous period.

In some embodiments, a period may represent a period of time, e.g., a month, a quarter, a year, etc. In some embodiments, the employee social security information may include employee-related information of the employee, and the employee-related information may include a city in which the employee unit of the employee is located, a post, wages, and a work status. In some embodiments, the employee social security information may further include social security payment standard information of a plurality of regions, wherein the insurance belongs to province-level planning and is in units of provinces; the other four dangers (i.e. medical insurance, unemployment insurance, industrial injury insurance and birth insurance) are city-level overall, and the region can be a certain city (e.g. metropolis, Chongqing city, etc.) in China. In some embodiments, the social security payment criteria information may include a social security payment base and a proportion for each of the plurality of regions.

In some embodiments, the data acquisition module may acquire social security payment criterion information of a plurality of regions of the current period based on a web crawler. In some embodiments, the data obtaining module obtains social security payment standard information of a plurality of regions of the current period based on the web crawler, and may include:

acquiring a parent URL (Uniform Resource Locator) of at least one initial webpage issued in a current period, where the initial webpage may be a webpage with a correlation degree with a preset topic greater than a preset correlation threshold, a crawler may deconstruct data of the webpage according to tags of various items, extract useful data, perform word segmentation and word extraction on the data to generate at least one word to be analyzed, determine at least one subject word (e.g., social security new policy, endowment insurance, public accumulation, etc.) according to the preset topic, calculate a word similarity between each word to be analyzed and each subject word based on a similarity algorithm (e.g., cosine algorithm based on space vector, text similarity algorithm based on semantic similarity, chinese fuzzy search algorithm based on pinyin similarity, etc.), and when the similarity between the word to be analyzed and any subject word is greater than the preset threshold (e.g., 90%), taking the analysis word as a target analysis word, calculating the proportion of the number of the target analysis word in the total number of the words to be analyzed, and if the proportion is greater than a preset proportion threshold (for example, 50%), taking the webpage as an initial webpage; in some embodiments, the initial web page may be a web page in a public government official website, for example, a web page in the national social security public service platform of the people's republic of china;

extracting at least one child URL from a parent URL of at least one initial webpage, wherein the child URL can be a URL corresponding to a link in the at least one initial webpage;

determining at least one target sub URL from the at least one sub URL based on the publishing time of the at least one sub URL and the correlation degree between the webpage content of the at least one sub URL and a preset theme, wherein the preset theme can be; social security new policies, endowment insurance, public accumulation, and the like; in some embodiments, when the publication time of the child URL is not within the current period, the child URL is not the target child URL; when the publishing time of the sub URL is not in the current period, and the correlation degree of the webpage content of at least one sub URL and the preset theme is greater than a preset correlation degree threshold value, the sub URL is a target sub URL; the crawler may deconstruct data of a webpage corresponding to a child URL according to tags of each item, extract useful data, generate at least one word to be analyzed after segmenting and extracting the word from the data, determine at least one subject word (e.g., social security new policy, endowment insurance, public accumulation, etc.) according to a preset theme, calculate a word similarity of each word to be analyzed and each subject word based on a similarity algorithm (e.g., cosine algorithm based on a space vector, text similarity algorithm based on semantic similarity, chinese fuzzy search algorithm based on pinyin similarity, etc.), regard the analyzed word as a target analyzed word when the similarity of the word to be analyzed and any one subject word is greater than a preset threshold (e.g., 90%), calculate a ratio of the number of the target analyzed word to the total number of the words to be analyzed, if the ratio is greater than the preset ratio threshold (e.g., 50%), then the child URL may be the target child URL;

and performing data capture on the father URL and the at least one target child URL of the at least one initial webpage to acquire social security payment standard information of a plurality of regions in the current period.

In some embodiments, the state comparison module may be configured to compare staff social security information of a current period with staff social security information of a previous period, and determine whether the staff social security information is changed.

In some embodiments, the determining, by the status comparison module, whether the employee social security information is changed may include:

judging whether the city, the affiliated post, the wage and the working state of the employee belonging to the employment unit in the current period are consistent with the city, the affiliated post, the wage and the working state of the employee belonging to the employment unit in the previous period, and judging that the employee social security information is changed when any one or more of the city, the affiliated post, the wage and the working state of the employee belonging to the employment unit in the current period are different from the city, the affiliated post, the wage and the working state of the employee belonging to the employment unit in the previous period; for example, if the working state of the employee is leaving in the current period and the working state of the employee is on in the previous period, the change of the social security information of the employee is judged;

if the city, the post, the wage and the working state of the belonged human unit of the employee in the current period are completely consistent with the city, the post, the wage and the working state of the belonged human unit of the employee in the previous period, judging whether the social security payment standard information of the region corresponding to the city of the belonged human unit in the current period is consistent with the social security payment standard information of the region corresponding to the city of the belonged human unit in the previous period;

if the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the current period is inconsistent with the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the previous period, judging that the employee social security information is changed; for example, when the social security payment base number or the proportion of the region corresponding to the city where the affiliated person unit is located is different from the social security payment base number or the proportion of the region corresponding to the city where the affiliated person unit is located in the previous period, the staff social security information is judged to be changed.

It should be noted that the frequency of the change of the regional social security payment standard information is lower than the frequency of the change of the city, the post, the wage and the working state of the employee belonging to the human unit, and the data required to be acquired by judging the change of the regional social security payment standard information is larger, so that before comparing whether the social security payment standard information of the region corresponding to the city of the employee belonging to the human unit in the current period is consistent with the social security payment standard information of the region corresponding to the city of the human unit in the previous period, the city, the post, the wage and the working state of the employee belonging to the human unit in the current period are consistent with the city, the post, the wage and the working state of the employee belonging to the human unit in the previous period, the efficiency of judging whether the employee social security information is changed can be effectively improved.

In some embodiments, the social security adjustment module may be configured to adjust the employee social security payment state based on the employee social security information of the current period through the deep reinforcement learning model when the state comparison module determines that the employee social security information changes.

In some embodiments, the employee social security payment status may include whether to stop payment, whether to continue payment, a payment base and a proportion, and the like. In some embodiments, the deep reinforcement learning model may include at least one hidden layer, which may be a Long Short-Term Memory network (LSTM), and the input of the deep reinforcement learning model may include changes in the employee social security information over time in addition to observations of the employee social security information at each cycle, and the Long Short-Term Memory network may discover and learn the relationship between the time series and the employee social security information. In some embodiments, given the large search space for training the deep reinforcement learning model, the multi-layer dense neural network may be too large to train, and thus, the output layer of the deep reinforcement learning model may be a fully-connected neural network. In some embodiments, the social security adjustment module adjusts the employee social security payment state based on the employee social security information of the current period through a deep reinforcement learning model, including; taking the employee social security information of the current period as the input of a deep reinforcement learning model; and adjusting the social security payment state of the staff through the optimal action value function.

In some embodiments, the strategy for adjusting the social security payment state of the employee is to maximize the return, that is, the feedback brought after each operation of the agent of the deep reinforcement learning model, positive feedback brought by correct operation, and negative feedback brought by incorrect operation. In order to achieve the goal of maximum return, a Markov decision process is introduced when the staff social security payment state is adjusted based on the staff social security information of the current period through a deep reinforcement learning model. The Markov process is a random process in which the current state distribution probability is only related to the previous moment, and the Markov decision process adds a variable in the Markov process: and a step a. State S at the next moment_t+1Not only with the state S at the present moment_tIs related to, and is related to, action A_tAnd (4) correlating. In order to be able to determine the return of an action after a state change, an action state cost function is defined, i.e. given a policy pi, a state s, an action cost function of taking an action a:

wherein the content of the first and second substances,

for a given policy π, State s, value of action to take action a, G_tFor a given strategy pi, state s, harvest taking action a,

harvesting of action a for a given strategy π, State s (G)_t) The expectation is that.

The optimal action cost function for taking action a at state s is:

wherein the content of the first and second substances,

the largest one of the numerous action state cost functions generated under all policies.

In some embodiments, when training the deep reinforcement learning model, a long-short term memory network is adopted to approximate a state value function, namely, an estimated Q value, and the input data of the model is the state f of the current period_tThe network outputs the estimated value of all the motions in the motion space, the Q value is estimated in each period, and the Mean Square Error (MSE) is used as a loss function of the training model. According to the Markov process, the predicted Q value of each period is Q^*(s, a), the expression of the target Q value is: r + γ max Q (s ', a'), i.e. the target Q value is equal to the expected Q value after the state is transferred from s to s 'and the action is transferred from a to a' plus the environmental reward r of the current cycle, γ is a decay factor, and the value range is 0 to 1. In order to maximize the Q value, the loss function should be minimized. And after the gradient of the loss function is solved, the parameters are updated iteratively.

The predicted Q value and the target Q value use the same parameter model, and when the predicted Q value is increased, the target Q value is increased, which increases the risk of oscillation divergence of the deep reinforcement learning model to a certain extent. To solve this problem, two networks (i.e., a prediction network used to evaluate the current state action cost function and a target network used to generate a target Q value) are used for learning. And the deep reinforcement learning model updates the parameters of the prediction network according to the loss function, and copies the parameters in the prediction network to the parameters in the target network after a certain number of iterations. By introducing the target network, the target Q value in a period of time is kept unchanged, the correlation between the predicted Q value and the target Q value is reduced to a certain extent, and the stability of the deep reinforcement learning model is improved.

In some embodiments, the data pushing module may be configured to send the staff social security payment status of the last period and the adjusted staff social security payment status to the social security checking terminal 130 after the social security adjustment module adjusts the staff social security payment status. In some embodiments, the data pushing module may further send the social security information of the employee in the previous period and the current period (for example, at least one of the city where the employee's belonged person unit is located, the belonged position, the wage and work state, and the social security payment standard information of the region where the employee's belonged person unit is located) to the social security checking terminal 130. In some embodiments, the employee who manages the employee social security in the employment unit may check the employee social security payment status of the previous period, the adjusted employee social security payment status, the employee social security information of the previous period and the current period through the social security checking terminal 130, and check whether the adjusted employee social security payment status is reasonable based on the employee social security information of the previous period and the current period.

FIG. 4 is an exemplary flow chart of a method for information management based on a deep reinforcement learning model according to some embodiments of the present application. As shown in fig. 4, an information management method based on a deep reinforcement learning model includes the following steps. In some embodiments, a deep reinforcement learning model-based information management method may be performed by a deep reinforcement learning model-based information management system 100. An operational diagram of an information management method based on a deep reinforcement learning model presented below is illustrative. In some embodiments, the process may be accomplished with one or more additional operations not described and/or one or more operations not discussed. Additionally, the order of the operations of one method of information management based on a deep reinforcement learning model illustrated in FIG. 4 and described below is not intended to be limiting.

And step 410, obtaining the employee social security information of the current period and the employee social security information of the previous period. For more description of obtaining employee social security information, reference may be made to fig. 3 and its associated description.

And step 420, comparing the employee social security information of the current period with the employee social security information of the previous period, and judging whether the employee social security information is changed. For more description of obtaining employee social security information, reference may be made to fig. 3 and its associated description.

And 430, when the employee social security information is judged to be changed, adjusting the employee social security payment state based on the employee social security information of the current period through the deep reinforcement learning model. For more description of obtaining employee social security information, reference may be made to fig. 3 and its associated description.

In other embodiments of the present application, an information management apparatus based on a deep reinforcement learning model is provided, which includes at least one processing device and at least one storage device; the at least one storage device is used for storing computer instructions, and the at least one processing device is used for executing at least part of the computer instructions to realize the information management method based on the deep reinforcement learning model.

In still other embodiments of the present application, a computer-readable storage medium is provided that stores computer instructions that, when executed by a processing device, implement a deep reinforcement learning model-based information management method as above.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

The entire contents of each patent, patent application publication, and other material cited in this application, such as articles, books, specifications, publications, documents, and the like, are hereby incorporated by reference into this application. Except where the application is filed in a manner inconsistent or contrary to the present disclosure, and except where the claim is filed in its broadest scope (whether present or later appended to the application) as well. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the present disclosure.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present application. Other variations are also possible within the scope of the present application. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the present application can be viewed as being consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. An information management system based on a deep reinforcement learning model, comprising:

the data acquisition module is used for acquiring employee social security information of the current period and employee social security information of the previous period;

the state comparison module is used for comparing the employee social security information of the current period with the employee social security information of the previous period and judging whether the employee social security information is changed;

and the social security adjusting module is used for adjusting the staff social security payment state based on the staff social security information of the current period through a deep reinforcement learning model when the state comparison module judges that the staff social security information is changed.

2. The information management system based on the deep reinforcement learning model as claimed in claim 1, wherein the employee social security information includes employee related information, the employee related information includes a city where an affiliated staff unit of the employee is located, an affiliated post, wages and a working state;

the employee social security information further comprises social security payment standard information of a plurality of regions.

3. The deep reinforcement learning model-based information management system according to claim 2, wherein the data acquisition module acquires the social security payment standard information of the plurality of regions of the current period based on a web crawler.

4. The deep reinforcement learning model-based information management system according to claim 3, wherein the data obtaining module obtains the social insurance payment standard information of the plurality of regions of the current period based on a web crawler, and the obtaining module comprises:

acquiring a parent URL of at least one initial webpage issued in the current period;

extracting at least one child URL from a parent URL of the at least one initial webpage;

determining at least one target sub URL from the at least one sub URL based on the publishing time of the at least one sub URL and the correlation degree of the webpage content of the at least one sub URL and a preset theme;

and performing data capture on the parent URL of the at least one initial webpage and the at least one target child URL to acquire social security payment standard information of the plurality of regions in the current period.

5. The information management system based on the deep reinforcement learning model as claimed in claim 4, wherein the data obtaining module obtains the social insurance payment standard information of the plurality of regions of the current period based on a web crawler, further comprising:

and before data capture is carried out on the parent URL of the at least one initial webpage and the at least one target child URL, carrying out duplicate removal on the parent URL of the at least one initial webpage and the at least one target child URL.

6. The information management system based on the deep reinforcement learning model as claimed in any one of claims 2 to 5, wherein the state comparison module compares the employee social security information of the current period with the employee social security information of the previous period to determine whether the employee social security information is changed includes:

judging whether the city, the post, the wage and the working state of the belonged personnel unit of the employee in the current period are consistent with the city, the post, the wage and the working state of the belonged personnel unit of the employee in the previous period;

if not, judging that the employee social security information is changed;

if so, judging whether the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the current period is consistent with the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the previous period;

and if the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the current period is inconsistent with the social security payment standard information of the region corresponding to the city where the belonged person unit is located in the previous period, judging that the employee social security information is changed.

7. The information management system based on the deep reinforcement learning model is characterized in that the deep reinforcement learning model comprises at least one hidden layer, and the hidden layer is a long-term and short-term memory network;

and the output layer of the deep reinforcement learning model is a fully-connected neural network.

8. The information management system based on the deep reinforcement learning model as claimed in claim 7, wherein the social security adjusting module adjusts the staff social security payment state based on the staff social security information of the current period through the deep reinforcement learning model, including;

taking the employee social security information of the current period as the input of the deep reinforcement learning model;

and adjusting the employee social security payment state through an optimal action value function.

9. The information management system based on the deep reinforcement learning model as claimed in any one of claims 1 to 5, further comprising a data pushing module and at least one social security check terminal;

the data pushing module is used for sending the staff social security payment state of the last period and the adjusted staff social security payment state to the at least one social security checking terminal after the social security adjusting module adjusts the staff social security payment state.

10. An information management method based on a deep reinforcement learning model is characterized by comprising the following steps:

acquiring employee social security information of a current period and employee social security information of a previous period;

comparing the employee social security information of the current period with the employee social security information of the previous period, and judging whether the employee social security information is changed;

and when the employee social security information is judged to be changed, adjusting the employee social security payment state based on the employee social security information of the current period through a deep reinforcement learning model.