Detailed Description
In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure. It is to be understood that the described embodiments are only a few, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from one or more embodiments of the disclosure without making any creative effort shall fall within the scope of the disclosure.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of systems and methods consistent with certain aspects of the present description, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
With the improvement of informatization level, people generally utilize a computer system to perform network supervision on objects such as enterprises and the like; specifically, the computer system may acquire various types of data of the monitored object from various data sources, and after fusing the data acquired from the various data sources, store the fused data in a monitoring database in the form of an "identifier-attribute" pair for querying and managing by related personnel;
for example, the enterprise supervision system may obtain an Identifier (ID) of the monitored enterprise and attributes such as an enterprise name, an operating condition, a share right structure, risk data, and the like from a data source in the form of a web crawler or the like, and store the data having the corresponding relationship into the supervision database for querying and managing by related personnel.
However, with the above scheme, various data from different data sources may carry different identifiers, and a new identifier is often generated after fusion, which may cause the identifiers of records corresponding to the same monitored object in the supervision database to change continuously, and may eventually cause the query and management efficiency to decrease;
for example, a certain enterprise supervision system acquires supervision data a and supervision data b for the same enterprise a from two data sources a and b, wherein the two data sources carry an identifier a and an identifier b respectively, and supervision data c and an identifier c are generated after the two data sources are fused; for the manager, the enterprise a has three types of identifiers, i.e., a first identifier, a second identifier and a third identifier, and if the identifiers are fused for many times, the corresponding identifiers may change more, and many identifiers need to be considered during management and query, so that the problem of low efficiency exists.
If in order to improve the efficiency of subsequent management and query, only data from a single data source is used, and data fusion is not needed any more, the problem that identification changes due to data fusion is avoided to a greater extent; however, such an approach may have poor coverage of data acquisition for the supervised object relative to a scheme that acquires data from multiple data sources.
In view of this, the present specification discloses a technical solution for replacing the identifier of the supervised object carried in the supervision data after data fusion with the identifier of the supervised object carried by the person with the highest credibility in the corresponding supervision data before fusion.
When the method is realized, the corresponding monitoring data before fusion can be obtained in a mode of searching in the prestored historical monitoring data according to the attribute and the time information of the carried monitored object; and then the person with the highest credibility score is selected from the data, and the person can be used for replacing the identification of the monitored object of the fused monitoring data.
By applying the scheme, on one hand, the attributes of the supervised objects which are all from the plurality of pre-fusion supervision data are reserved, so that the full utilization of a data source is ensured, and the coverage of data acquisition of the supervised objects is improved;
on the other hand, because the attributes of the monitored objects carried by the historical monitoring data retrieved in the scheme are matched with the monitoring data to be processed, and the updated time is earlier than that of the monitoring data to be processed, the identifiers of the monitored objects carried by the historical monitoring data are used for replacing the identifiers of the monitored objects carried by the monitoring data to be processed, so that the situation that the identifiers of the monitored objects change after data fusion can be avoided, the reliability and the stability of the identifiers of the monitored objects in the produced monitoring data can be improved, and the subsequent query and management efficiency of the monitored objects can be improved.
The following describes the scheme in this specification by using specific embodiments and combining specific application scenarios.
Referring to fig. 1, fig. 1 is a diagram illustrating an application scenario of the data processing method described in this specification;
as shown in fig. 1, in this scenario, the monitoring system may include a monitoring database and a data processing server connected to each other; the data processing server is connected with a plurality of external data sources; it should be understood that the data processing server, the monitoring database, and the data source may be a plurality of devices that provide different functions separately, or may be a plurality of programs that execute different functions on the same device, and this specification does not need to limit this.
The supervision database may be a database of any form, for example, it may be a relational database, a non-relational database, a dedicated database, a database shared with other functions, a single database server, a database server cluster or a distributed database, and the present specification is not particularly limited; the supervision database can store the identification and the corresponding attribute of the supervised object for monitoring and management; in actual design, only the corresponding functions can be completed, and the functions can be regarded as the supervision database.
The data processing server is structurally positioned between a data source and a supervision database, and can functionally process source data acquired from the data source so as to be stored in the supervision database; the specific implementation form may be an independent server device, an operating server program, or a group of server clusters that can provide the above functions, and the like, and the specific implementation form need not be specifically limited in this specification, and those skilled in the art may determine the specific implementation form according to specific needs and related technical documents.
The data source is a component which is structurally connected with a data processing server in a supervisory system and can functionally provide related information of an object to be supervised by the supervisory system, and the component can be a whole data acquisition system which has a cooperative relationship with the supervisory system and can also be the last stage of a path for transmitting the information of the supervised object to the supervisory system; in a specific implementation form, the data processing server may be a network crawler or other programs, or a forwarding program that acquires information from other network crawlers and forwards the information to the data processing server, or a preprocessing program that has functions of data mining and data cleaning and returns preprocessed data to the data processing server, and the like; specific implementation manner those skilled in the art can design according to specific requirements, and the description is not limited in detail.
It is understood that the above supervision system may further include other components, such as a component for invoking data in the above supervision database for supervision analysis, etc., which have been omitted in fig. 1 since the technical solution disclosed below in this specification does not substantially relate to the design of this part; the data source, the data processing server and the supervision database may also provide other functions, for example, the data processing server may also be responsible for processing tasks such as fusion, splitting and the like of supervision data from multiple data sources before executing the data processing method described in this specification; therefore, the present specification does not limit other components in the above supervision system and other functions that the above components can perform, and those skilled in the art can design other parts in the above supervision system according to specific needs and relevant technical documents;
it is also understood that the above-mentioned scenario including the monitoring system is only a feasible example, and in other scenarios where the same problem needs to be solved, the skilled person can perform the migration design by himself, for example, if the data fusion is completed but the monitoring data not stored in the monitoring database needs to be processed, the method may be executed by the data processing server after the data fusion is completed; if the supervision data is already stored in the supervision database, the execution main body of the data processing method only needs to be connected with the supervision database, but not necessarily the data processing server respectively connected with the data source and the supervision database; therefore, this specification is not necessarily limited to this.
Referring to fig. 2, fig. 2 is a data processing method according to an embodiment of the present disclosure, the method includes the following steps:
s201, acquiring supervisory data to be processed; the supervisory data to be processed comprises supervisory data obtained by performing data fusion on supervisory data from a plurality of data sources; the supervision data carries the identification of the supervised object, the attribute of the supervised object and the time when the supervision data is updated;
s202, retrieving a plurality of historical supervision data which are matched with the attributes of the carried supervised object and are earlier than the to-be-processed supervision data at the updating moment from the prestored historical supervision data;
s203, determining the historical supervision data with the highest credibility score in the plurality of historical supervision data, and replacing the identifier of the supervised object carried in the supervision data with the identifier of the supervised object carried in the historical supervision data with the highest credibility score; wherein the confidence score indicates a confidence of the regulatory data.
The identification of the supervised object comprises information which can uniquely identify the supervised object; for example, when the supervised object is a business, the identification may be a business registration number, a business standard number, or the like; when the supervised object is a personal asset, the identification can be an asset account identification code or the like; it is understood that each data source may also adopt a customized identification format, such as a combined format of asset account-identification number, etc., and this specification need not be limited; in addition, since the present solution may involve pre-fusion supervision data from a plurality of different data sources, and thus may also involve a plurality of different identification formats, it may happen that the same supervised object has different corresponding identifications in data obtained from different data sources.
The attributes of the supervised objects comprise information for describing the characteristics or the states of the corresponding supervised objects; for example, when the supervised object is a business, the corresponding attributes may include a business name, a business type, a business management status, a business equity structure, business risk analysis information, and so on; when the supervised object is a personal asset, the corresponding attribute may include asset type, asset amount, asset increment expected analysis information, and the like; it is understood that each data source may also adopt a customized attribute format, such as a combined format of "feature-feature value", and the like, and the description need not be limited; in addition, since the present solution may involve a plurality of different data sources, and thus may also involve a plurality of different attribute representations, the same attribute of the same supervised object may occur, and there may be cases where different representations exist in data acquired from different data sources.
When the supervision data to be processed is the supervision data obtained by data fusion of the supervision data from a plurality of data sources, the time of the occurrence of the corresponding data fusion process can be indicated, specifically, a timestamp form can be adopted, and other counting forms capable of representing time sequence can also be adopted, such as increasing magic words and the like.
The data fusion can be a process of merging multiple pieces of supervision data into newly-built supervision data; for convenience of description, the supervision data subjected to the data fusion is hereinafter referred to as supervision data before fusion, and the product of the data fusion is hereinafter referred to as supervision data after fusion.
The credibility scores comprise scores capable of indicating the credibility of the supervision data; specifically, the score can be obtained based on a plurality of evaluation dimensions; for example, the weight of the data source from which the supervision data comes may be preset, a higher credibility weight may be given to the bulletin of a more credible government-related department, and a lower credibility weight may be given to a personal information publishing account with a lower credibility; for another example, since the administrative data carrying the supervised object with richer attribute types generally has higher credibility, a higher credibility score may be given to the administrative data, whereas if a piece of administrative data carries few attributes of the supervised object, the data to be processed may not come from the information-rich data source, and a lower credibility score may be given to the administrative data; it can be seen that the specific calculation mode of the above reliability score need not be specifically limited in this specification, and can be determined by a person skilled in the art according to specific needs.
In an embodiment shown, the calculation of the credibility score may refer to the number of attributes of the supervised object carried by the corresponding supervision data; the more the number of the attributes of the supervised object is, the richer the attribute information aiming at the supervised object is contained in the data to be processed, so that the data to be processed can be considered to have higher reliability; that is, the credibility score positively correlates with the number of attributes of the supervision object carried by the corresponding supervision data.
In this specification, the acquisition channel of the supervisory data to be processed does not need to be limited; for example, in the case that the to-be-processed supervision data is supervision data obtained by performing data fusion on supervision data from a plurality of data sources, the to-be-processed supervision data may be acquired in the data processing server and subjected to subsequent data processing operation before being stored in the supervision database, or may be additionally extracted from the supervision database and subjected to subsequent data processing operation after being stored in the supervision database.
In an embodiment shown, after the data processing step is completed, the processed supervision data can be stored in a supervision database; specifically, the interaction mode and strategy with the supervision database can be determined according to specific requirements by referring to related technologies, and the specification is not particularly limited; for example, the logging may be done in batches to achieve better operating efficiency, or in real time to achieve lower latency.
In this specification, retrieval may be performed from historical supervision data stored in advance, so as to acquire several pieces of supervision data before fusion. As can be seen from the foregoing description of the attributes of the supervised object, the attributes of the supervised object in the post-fusion supervision data are derived from several pieces of supervision data before fusion, and therefore, the above-mentioned retrieved condition may include that the carried attributes of the supervised object match with the post-fusion supervision data; and because the fused supervision data is subjected to data updating during fusion, the retrieved condition may further include that the supervision data is updated earlier than the to-be-processed supervision data.
Specifically, the matching may be strict correspondence of character strings, or may be the same in semantic level; for example, the company name attribute of "Apple company" and the company name attribute of "Apple company" may be considered semantically the same; if the attributes of the two pieces of supervision data are matched with each other, the two pieces of supervision data indicate that the two pieces of supervision data are actually the same supervised object; further, if the attributes of the supervised objects carried by the multiple sets of historical supervision data are matched with the fused supervision data, the multiple sets of historical supervision data can be identified as the supervision data before fusion.
It is understood that, in the above process, the specific criteria for determining matching and the specific algorithm of the matching process can be determined by those skilled in the art according to specific requirements and referring to the related art, and the description does not need to be specifically limited.
In this specification, among the retrieved supervision data, the supervision data with the highest credibility score may be determined; as mentioned above, the reliability score may indicate the reliability of the supervision data, so that if the reliability score of a piece of supervision data is higher, the reliability is higher, and it is reasonable to consider that the identifier of the supervised object carried by the piece of supervision data also has higher reliability;
for example, a piece of to-be-processed supervision data a from an unknown website and a piece of to-be-processed supervision data B posted by a government-related department are retrieved together, and the credibility score of the to-be-processed supervision data B is significantly higher than that of the to-be-processed supervision data a, so that the credibility of the identification can be ensured to a greater extent by using the identification of the supervised object carried in the to-be-processed supervision data B.
In this specification, the identifier of the supervised object carried in the fused supervision data may be replaced with the identifier of the supervised object carried in the determined supervision data with the highest credibility score; the specific implementation manner can be determined by a person skilled in the art according to specific needs and related technologies, and the specification is not particularly limited.
Please refer to the upper half of fig. 3, the upper half of fig. 3 is an exemplary diagram of data fusion described in the present specification; in this example, there are three pieces of pre-fusion regulatory data, respectively: supervision data which are identified as ID-01 and have the attribute of Apple company, supervision data which are identified as ID-02 and have the attribute of Apple company, and supervision data which are identified as ID-03 and have the attribute of big Apple company; the method also comprises the steps that a piece of fused supervision data exists, namely the supervision data comprises three attributes of Apple company, Apple company and Apple company;
assuming that the supervision data before the three-point fusion is retrieved from the historical supervision data in a character string matching manner after the fused supervision data is obtained, and assuming that the credibility score of the supervision data before the fusion corresponding to the ID-02 is the highest, the identifier of the supervised object in the fused data is finally replaced by the ID-02 identifier, as shown by the dotted line in fig. 3.
Through the replacement, the identification of the monitored object carried in the supervision data before and after fusion can be ensured not to change frequently, and the stability of the identification of the monitored object is improved; meanwhile, the identifier of the supervised object for replacement is from the supervision data with the highest credibility score, so the credibility of the identifier can be improved; compared with the related art, the stability and the credibility of the identification of the supervised object in the produced supervision data are higher, so that the efficiency of inquiring and managing the supervision data can be improved.
In this specification, the supervision data to be processed may further include a plurality of supervision data obtained by splitting the fused supervision data; for example, in an upstream data source, it is found that a certain piece of fused supervision data contains information actually from multiple supervised objects, and therefore, the certain piece of fused supervision data is split, and a plurality of split supervision data corresponding to different supervised objects are generated; specifically, how to determine whether the fused supervision data contains information actually coming from multiple supervised objects, and how to perform the splitting operation specifically, those skilled in the art may refer to the related art description, and this specification does not need to be specifically limited.
It can be understood that the time when the split supervision data is carried by the update data may indicate the time when the corresponding split operation occurs.
In an embodiment shown in the foregoing, for a plurality of pieces of monitoring data after splitting, monitoring data before splitting may be retrieved from historical monitoring data stored in advance, and further, an identifier of a monitored object carried by monitoring data with the highest credibility score among the plurality of pieces of monitoring data after splitting is replaced with an identifier of a monitored object carried by the retrieved monitoring data before splitting;
specifically, since the attribute of the supervised object carried by the split supervisory data is derived from the supervisory data before splitting, the condition of the retrieval may include that the attribute of the carried supervised object matches with the multiple split supervisory data; the updating time of the plurality of pieces of post-splitting supervision data obtained by splitting is necessarily later than that of the supervision data before splitting, so the retrieval condition can also comprise that the updating time is earlier than the historical supervision data of the to-be-processed supervision data;
for the supervision data with the highest credibility score in the plurality of split supervision data, the supervision data with the highest credibility score can be determined by directly reading the pre-stored identification indicating the supervision data with the highest credibility score, or by obtaining the credibility score of each supervision data and then immediately selecting the highest person; the implementation manner of obtaining the credibility score of each piece of supervision data may be to calculate the corresponding credibility score for the plurality of split supervision data in real time, or may also be to read the corresponding credibility score from a preset data structure.
Referring to the lower half of fig. 3, the lower half of fig. 3 is an exemplary diagram of a data splitting scenario described in this specification; in this example, it is assumed that "Apple company" is found to be actually the same company as that indicated by "Apple company", but "big Apple company" indicates a different company, i.e., the above-described fused attributes of the supervised object indicate a plurality of companies, and thus are split;
obtaining two pieces of supervision data after splitting, namely supervision data corresponding to a large Apple company and supervision data corresponding to the Apple company and Apple company; assuming that the credibility score of the latter is higher, the identifier of the supervised object in the supervision data corresponding to the Apple company and the Apple company can be replaced by the identifier of the supervised object carried by the supervision data before fusion, namely ID-02; the "apple company" cannot inherit the ID-02 because the credibility score of the corresponding supervision data is not high enough; specifically, the previous corresponding identifier ID-03 can be found and utilized by further mining the historical supervision data, and a new identifier can be newly assigned by the system, which is not specifically limited in this specification.
In an embodiment shown, whether data indicating the same supervised object exists in the processed supervision data may also be checked based on a preset duplication checking policy; if yes, performing deduplication operation on the data indicating the same supervised object; specifically, the data with the highest credibility score in the data indicating the same supervised object may be further determined and retained, and the other data may be deleted.
In an embodiment shown, the above-mentioned duplication checking may be performed in any one or more of the following manners: semantic recognition can be carried out on the processed supervision data, so that whether the processed supervision data contains a plurality of supervision data indicating the same supervised object or not is determined according to the result of the semantic recognition; for example, in the above example, "Apple company" is semantically similar to "Apple company", and thus it can be determined that both indicate the same supervised object;
the character string matching can be carried out on the processed supervision data, so that whether the processed supervision data contains a plurality of supervision data indicating the same supervised object or not is determined according to the character string matching result; for example, "Apple company" and "Apple (Apple) company" may have a high coincidence rate in the character string matching level, and thus both may be considered to indicate the same supervised object;
keywords can be extracted from the processed supervision data, and the processed supervision data is inquired in a third-party database, so that whether the processed supervision data contains a plurality of supervision data indicating the same supervised object or not is determined according to the inquiry result; for example, both the Apple company and the Apple company can be used as keywords to query the same entry in a database of a third party, so that the Apple company and the Apple company can be judged to indicate the same supervised object;
in the above various implementation manners, a person skilled in the art may select or combine the various implementation manners according to specific requirements to complete specific implementations.
In an embodiment shown, the abnormal data can be repaired by using the pre-stored historical supervision data; specifically, whether an abnormality exists in the processed supervision data may be determined by calling a preset abnormality detection algorithm; in general, the attribute of the supervised object carrying the abnormality, or the identifier of the supervised object carrying the abnormality, or both, may be regarded as the existence of the abnormality; if the content does contain the data with abnormal content, historical supervision data corresponding to the supervision data with abnormal content can be obtained from the prestored historical supervision data, and the supervision data with abnormal content can be repaired according to the obtained historical supervision data;
the algorithm for detecting the abnormality can be designed by a person skilled in the art according to specific requirements and by referring to related technologies, and detailed setting is not required in the specification; the above process of repairing the supervision data with the abnormality may be to directly replace the abnormal part with the normal part of the acquired historical supervision data, or may also adopt other feasible repairing manners, and those skilled in the art may select a specific implementation manner by referring to the description of the related art, and the present specification does not need to be specifically limited.
Referring to fig. 4, fig. 4 is a diagram illustrating a structure of the data processing apparatus according to the present disclosure; the data processing apparatus may include the following modules:
an obtaining module 401, which obtains supervisory data to be processed; the supervisory data to be processed comprises supervisory data obtained by performing data fusion on supervisory data from a plurality of data sources; the supervision data carries the identification of the supervised object, the attribute of the supervised object and the time when the supervision data is updated;
a first retrieving module 402, configured to retrieve, from pre-stored historical supervision data, a plurality of historical supervision data in which the attribute of a carried supervised object matches the to-be-processed supervision data and is updated earlier than the to-be-processed supervision data;
a first replacement module 403, configured to determine historical supervision data with the highest credibility score in the plurality of historical supervision data, and replace an identifier of a supervised object carried in the supervision data with an identifier of the supervised object carried in the historical supervision data with the highest credibility score; wherein the confidence score indicates a confidence of the regulatory data.
In this specification, the channel through which the obtaining module 401 obtains the supervisory data to be processed does not need to be limited; for example, in the case that the to-be-processed supervision data is supervision data obtained by performing data fusion on supervision data from a plurality of data sources, the to-be-processed supervision data may be acquired in the data processing server and subjected to subsequent data processing operation before being stored in the supervision database, or may be additionally extracted from the supervision database and subjected to subsequent data processing operation after being stored in the supervision database.
In one embodiment, the apparatus may further include a storage module, configured to store the processed supervision data in a supervision database; specifically, the interaction mode and strategy with the supervision database can be determined according to specific requirements by referring to related technologies, and the specification is not particularly limited; for example, the logging may be done in batches to achieve better operating efficiency, or in real time to achieve lower latency.
In this specification, the first retrieving module 402 may retrieve from pre-stored historical regulatory data, thereby obtaining several regulatory data before fusion. As can be seen from the foregoing description of the attributes of the supervised object, the attributes of the supervised object in the post-fusion supervision data are derived from several pieces of supervision data before fusion, and therefore, the above-mentioned retrieved condition may include that the carried attributes of the supervised object match with the post-fusion supervision data; and because the fused supervision data is subjected to data updating during fusion, the retrieved condition may further include that the supervision data is updated earlier than the to-be-processed supervision data.
It is understood that, in the above process, the specific criteria for determining matching and the specific algorithm of the matching process can be determined by those skilled in the art according to specific requirements and referring to the related art, and the description does not need to be specifically limited.
In this specification, the first replacement module 403 may determine, from among the retrieved several pieces of supervision data, the supervision data with the highest credibility score; as mentioned above, the reliability score may indicate the reliability of the supervision data, so that if the reliability score of a piece of supervision data is higher, the reliability is higher, and it is reasonable to consider that the identifier of the supervised object carried by the piece of supervision data also has higher reliability;
for example, a piece of to-be-processed supervision data a from an unknown website and a piece of to-be-processed supervision data B posted by a government-related department are retrieved together, and the credibility score of the to-be-processed supervision data B is significantly higher than that of the to-be-processed supervision data a, so that the credibility of the identification can be ensured to a greater extent by using the identification of the supervised object carried in the to-be-processed supervision data B.
In this specification, the first replacement module 403 may also replace the identifier of the supervised object carried in the fused supervision data with the identifier of the supervised object carried in the determined supervision data with the highest credibility score; the specific implementation manner can be determined by a person skilled in the art according to specific needs and related technologies, and the specification is not particularly limited.
Through the replacement, the identification of the monitored object carried in the supervision data before and after fusion can be ensured not to change frequently, and the stability of the identification of the monitored object is improved; meanwhile, the identifier of the supervised object for replacement is from the supervision data with the highest credibility score, so the credibility of the identifier can be improved; compared with the related art, the stability and the credibility of the identification of the supervised object in the produced supervision data are higher, so that the efficiency of inquiring and managing the supervision data can be improved.
In this specification, the supervision data to be processed may further include a plurality of supervision data obtained by splitting the fused supervision data; for example, in an upstream data source, it is found that a certain piece of fused supervision data contains information actually from multiple supervised objects, and therefore, the certain piece of fused supervision data is split, and a plurality of split supervision data corresponding to different supervised objects are generated; specifically, how to determine whether the fused supervision data contains information actually coming from multiple supervised objects, and how to perform the splitting operation specifically, those skilled in the art may refer to the related art description, and this specification does not need to be specifically limited.
It can be understood that the time when the split supervision data is carried by the update data may indicate the time when the corresponding split operation occurs.
In an embodiment shown in the foregoing description, the data processing apparatus may further include a second retrieval module and a second replacement module, where, for the plurality of pieces of monitoring data after splitting, the second retrieval module may retrieve the monitoring data before splitting from the pre-stored historical monitoring data, and the second replacement module further replaces, by the second replacement module, an identifier of a monitored object carried by the monitoring data with the highest credibility score among the plurality of pieces of monitoring data after splitting with the identifier of the monitored object carried by the retrieved monitoring data before splitting;
specifically, since the attribute of the supervised object carried by the split supervisory data is derived from the supervisory data before splitting, the condition of the retrieval may include that the attribute of the carried supervised object matches with the multiple split supervisory data; the updating time of the plurality of pieces of post-splitting supervision data obtained by splitting is necessarily later than that of the supervision data before splitting, so the retrieval condition can also comprise that the updating time is earlier than the historical supervision data of the to-be-processed supervision data;
for the supervision data with the highest credibility score in the plurality of split supervision data, the supervision data with the highest credibility score can be determined by directly reading the pre-stored identification indicating the supervision data with the highest credibility score, or by obtaining the credibility score of each supervision data and then immediately selecting the highest person; the implementation manner of obtaining the credibility score of each piece of supervision data may be to calculate the corresponding credibility score for the plurality of split supervision data in real time, or may also be to read the corresponding credibility score from a preset data structure.
In an embodiment shown, the data processing apparatus may further include a duplicate removal module, where the duplicate removal module may check whether data indicating the same supervised object exists in the processed supervision data based on a preset duplicate checking policy; if yes, performing deduplication operation on the data indicating the same supervised object; specifically, the data with the highest credibility score in the data indicating the same supervised object may be further determined and retained, and the other data may be deleted.
In an illustrated embodiment, the duplication removal module may specifically perform the duplication checking in any one or more of the following manners: semantic recognition can be carried out on the processed supervision data, so that whether the processed supervision data contains a plurality of supervision data indicating the same supervised object or not is determined according to the result of the semantic recognition; the character string matching can be carried out on the processed supervision data, so that whether the processed supervision data contains a plurality of supervision data indicating the same supervised object or not is determined according to the character string matching result; keywords can be extracted from the processed supervision data, and the processed supervision data is inquired in a third-party database, so that whether the processed supervision data contains a plurality of supervision data indicating the same supervised object or not is determined according to the inquiry result;
in the above various implementation manners, a person skilled in the art may select or combine the various implementation manners according to specific requirements to complete specific implementations.
In an embodiment shown, the data processing apparatus may further include a repair module, where the repair module may complete repair of abnormal data by using the pre-stored historical supervision data; specifically, the repair module may first determine whether there is an abnormality in the processed supervisory data by calling a preset abnormality detection algorithm; in general, the attribute of the supervised object carrying the abnormality, or the identifier of the supervised object carrying the abnormality, or both, may be regarded as the existence of the abnormality; if the content does contain the data with abnormal content, historical supervision data corresponding to the supervision data with abnormal content can be obtained from the prestored historical supervision data, and the supervision data with abnormal content can be repaired according to the obtained historical supervision data;
the algorithm for detecting the abnormality can be designed by a person skilled in the art according to specific requirements and by referring to related technologies, and detailed setting is not required in the specification; the above process of repairing the supervision data with the abnormality may be to directly replace the abnormal part with the normal part of the corresponding early supervision data, or may also adopt other feasible repairing manners, and those skilled in the art may select a specific implementation manner by referring to the description of the related art, and the present specification does not need to be specifically limited.
Embodiments of the present specification further provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the foregoing data processing method when executing the program.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of a computing device according to an embodiment of the present disclosure, where the computing device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Embodiments of the present specification also provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the aforementioned data processing method.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present disclosure. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.