CN116975033A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN116975033A
CN116975033A CN202310943502.7A CN202310943502A CN116975033A CN 116975033 A CN116975033 A CN 116975033A CN 202310943502 A CN202310943502 A CN 202310943502A CN 116975033 A CN116975033 A CN 116975033A
Authority
CN
China
Prior art keywords
data
repository
target
file
changed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310943502.7A
Other languages
Chinese (zh)
Inventor
张煜
周淳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310943502.7A priority Critical patent/CN116975033A/en
Publication of CN116975033A publication Critical patent/CN116975033A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a data processing method, apparatus, device, medium, and program product, which can be applied to the field of computer technology and the field of financial technology. The method comprises the following steps: in response to receiving a data change request for a first repository, determining a second repository associated with the first repository based on a version repository identification in the data change request, the data change request further including changed data, the changed data being changed data in the first repository, the version repository corresponding to the version repository identification being generated based on the data in the first repository and the second repository; determining, based on the changed data, a plurality of target data associated with the changed data from the first repository and the second repository; detecting the changed data based on a plurality of target data to obtain a detection result; and responding to the data change request to reject the changed data from entering the version library under the condition that the detection result indicates that the changed data is abnormal data.

Description

Data processing method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technology and financial technology, and in particular, to a data processing method, apparatus, device, medium, and program product.
Background
Version libraries are typically deployed on the server side, including data resources that can implement specific business processes, such as: code data, configuration files, etc. The data resources in the version store are typically obtained from different multiple repositories, where duplicate data resources may exist. If repeated data resources exist in the version library, errors may occur in the process of loading the data resources in the version library by the server. A new version store needs to be generated to replace the version store.
In the process of implementing the disclosed concept, the inventor finds that at least the following problems exist in the related art: and regenerating a new version library, wasting resources and reducing the processing efficiency.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a data processing method, apparatus, device, medium, and program product.
According to a first aspect of the present disclosure, there is provided a data processing method comprising:
in response to receiving a data change request for a first repository, determining a second repository associated with the first repository based on a version repository identification in the data change request, wherein the data change request further includes changed data, the changed data being changed data in the first repository, the version repository corresponding to the version repository identification being generated based on the data in the first repository and the second repository; determining, based on the changed data, a plurality of target data associated with the changed data from the first repository and the second repository; detecting the changed data based on a plurality of target data to obtain a detection result; and responding to the data change request to reject the changed data from entering the version library under the condition that the detection result indicates that the changed data is abnormal data.
According to an embodiment of the present disclosure, the first repository includes a target file, and the data processing method further includes: and determining the type of the attribution file of the changed data based on the attribute information of the changed data, wherein the type of the attribution file is used for representing the type of the target file where the changed data is located.
According to an embodiment of the present disclosure, determining, based on changed data, a plurality of target data associated with the changed data from a first repository and a second repository, includes: determining a plurality of first target data from a first repository and a plurality of second target data from a second repository based on the home file type of the changed data; the plurality of target data is determined based on the plurality of first target data and the plurality of second target data.
According to an embodiment of the present disclosure, the home file type includes a content merge type, determining a plurality of first target data from a first repository and a plurality of second target data from a second repository based on the home file type of the changed data, including: under the condition that the attribution file type of the changed data is the content merging type, determining a first file to be merged for carrying out content merging with the target file from a first storage library based on the file identification of the target file where the changed data is located, and determining a second file to be merged for carrying out content merging with the target file from a second storage library; determining a plurality of first target data based on a plurality of data included in the first file to be merged; and determining a plurality of second target data based on the plurality of data included in the second file to be merged.
According to an embodiment of the present disclosure, the home file type includes a content non-merge type, determining a plurality of first target data from a first repository and a plurality of second target data from a second repository based on a data type of the changed data, including: under the condition that the attribution file type of the changed data is the content non-merging type, extracting keywords from the file identification of the target file to obtain target keywords; based on the target key, a plurality of first target file identifications matching the target key are determined from a first repository as a plurality of first target data, and a plurality of second target file identifications matching the target key are determined from a second repository as a plurality of second target data.
According to an embodiment of the present disclosure, detecting changed data based on a plurality of target data to obtain a detection result includes: under the condition that the attribution file type of the changed data is a content merging type, carrying out similarity calculation on the data information of each target data and the data information of the changed data to obtain a first similarity value, wherein the data information comprises a data identifier and interface information; and determining a detection result which characterizes the changed data as abnormal data when a first target similarity value exceeding a target threshold exists in the plurality of first similarity values.
According to an embodiment of the present disclosure, detecting changed data based on a plurality of target data to obtain a detection result includes: under the condition that the attribution file type of the changed data is the content non-merging type, carrying out similarity calculation on the file identification of the target file and a plurality of target data to obtain a plurality of second similarity values; and determining a detection result which characterizes the changed data as abnormal data when a second target similarity value exceeding a target threshold exists in the plurality of second similarity values.
According to the embodiment of the disclosure, based on the target data identification of the changed data, identifying a plurality of data identifications in the shared space to obtain an identification result, wherein the target data identification of the changed data is the same as the target data identification of the data before being changed, the plurality of data identifications in the shared space are in one-to-one correspondence with a plurality of target historical data, and the plurality of target historical data are determined from the first storage library and the second storage library based on the historical data change request; determining a plurality of target data associated with the changed data from the first storage library and the second storage library respectively under the condition that the data identification associated with the target data identification does not exist in the identification result representation shared space; a plurality of target data and target data identifications are stored in a shared space.
A second aspect of the present disclosure provides a data processing apparatus comprising: the system comprises a second storage library determining module, a target data determining module, a data detecting module and a data responding module.
A second repository determination module for determining, in response to receiving a data change request for the first repository, a second repository associated with the first repository based on a version repository identification in the data change request, wherein the data change request further includes changed data, the changed data being changed data in the first repository, the version repository corresponding to the version repository identification being generated based on the data in the first repository and the second repository; a target data determination module for determining a plurality of target data associated with the changed data from the first repository and the second repository based on the changed data; the data detection module is used for detecting the changed data based on a plurality of target data to obtain a detection result; and the data response module is used for responding to the data change request to reject the changed data from entering the version library under the condition that the detection result represents that the changed data is abnormal data.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method described above.
A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above method.
According to the data processing method, apparatus, device, medium and program product provided by the present disclosure, by receiving a data change request for a first repository, a second repository associated with the first repository is determined based on a version repository identification in the data change request. Wherein the data change request is a request to change data in a version store, the version store being generated based on the data in the first store and the second store. The plurality of target data associated with the changed data can be determined from the version library based on the changed data, and detection of the changed data can be achieved based on the plurality of target data, so that the changed data is refused to enter the version library when the detection result indicates that the changed data is abnormal data. The method has the advantages that the changed data is comprehensively detected by a plurality of target data contained in the first storage library and the second storage library before the changed data enters the version library, so that the technical problems of wasting resources and reducing processing efficiency are at least partially solved, the usability of the version library is guaranteed, and the technical effect of wasting resources caused by reworking the version library is achieved.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1A schematically illustrates a schematic diagram of version package generation and deployment in the related art, according to an embodiment of the present disclosure.
Fig. 1B schematically illustrates a schematic diagram of detecting data of a code library and a shared library in the related art according to an embodiment of the present disclosure.
Fig. 2 schematically illustrates an application scenario diagram of a data processing method, apparatus, device, medium and program product according to an embodiment of the present disclosure.
Fig. 3 schematically illustrates a flow chart of a data processing method according to an embodiment of the disclosure.
Fig. 4 schematically illustrates a flow chart of determining target data according to an embodiment of the disclosure.
Fig. 5 schematically illustrates a schematic diagram of a data processing method according to another embodiment of the present disclosure.
Fig. 6 schematically illustrates a block diagram of a data processing method according to an embodiment of the present disclosure.
Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a data processing method according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the invention, the related user information (including but not limited to user personal information, user image information, user equipment information, such as position information and the like) and data (including but not limited to data for analysis, stored data, displayed data and the like) are information and data authorized by a user or fully authorized by all parties, and the processing of the related data such as collection, storage, use, processing, transmission, provision, disclosure, application and the like are all conducted according to the related laws and regulations and standards of related countries and regions, necessary security measures are adopted, no prejudice to the public welfare is provided, and corresponding operation inlets are provided for the user to select authorization or rejection.
During research, it has been found that, with the development of information technology, in the development engineering of a program or a system, it is required to obtain data resources from a private or shared repository, for example: code data, configuration files and the like, data resources are packaged into version libraries and then deployed on specific servers. The repository may be a code repository, which includes code data and configuration files, etc. that need to be published to the version repository.
FIG. 1A schematically illustrates a schematic diagram of version library generation and deployment in the related art, in accordance with an embodiment of the present disclosure.
As shown in fig. 1A, a compiling policy exists in a process of generating a version library based on a code library, so that code resources included in the code library can be converted into resources which can be used in the version library, and meanwhile, different deployment policies are further included for different application scenarios, the deployment policies are generally obtained due to different environments and different configuration requirements, the version library can be deployed in a corresponding server through the different deployment policies, and accordingly, corresponding business processing can be achieved through running the version library.
FIG. 1B schematically illustrates a schematic diagram of generating a version library based on multiple storage libraries in the related art according to an embodiment of the present disclosure.
As shown in fig. 1B, the repository may be a private code repository and a shared code repository, such as the private repository and the shared repository described below. To increase code reusability, save development cost, reduce error rate, common codes, common configuration files may be maintained centrally in a shared library. The specific codes and configuration files of each server are maintained in a private library, files of the private library and files of a shared library are combined and packaged, and the configuration files can be combined through strategies.
Typically, only a single library may be identified in the process of transferring data included in the shared library or the private library to the version library, e.g., as shown in fig. 1B, the identification of the private library may be implemented: the data resources submitted by the code library 1 and the code library 2 can be checked, and the data resources submitted by the shared library can be checked. The code library 1 and the files of the shared library are combined and packaged to output a version library 1; the data resources of the code library 2 and the shared library are combined and packaged to output a data version library 2, and if the data resources are repeated in the version library 1, the data resources cannot be preset, and because the problem of unusable reporting errors occurs when the version library is loaded, a new version library needs to be generated to replace the version library, so that the problems of waste of computing resources and lower processing efficiency exist. Or because the repeated data resources exist, the same configuration information is possible, redundant configuration exists, storage resources are wasted, and the problem of repeated loading is possibly caused.
In view of this, embodiments of the present disclosure provide a data processing method, in response to receiving a data change request for a first repository, determining a second repository associated with the first repository based on a version repository identification in the data change request, wherein the data change request further includes changed data, the changed data being changed data in the first repository, the version repository corresponding to the version repository identification being generated based on the data in the first repository and the second repository; determining, based on the changed data, a plurality of target data associated with the changed data from the first repository and the second repository; detecting the changed data based on a plurality of target data to obtain a detection result; and responding to the data change request to reject the changed data from entering the version library under the condition that the detection result indicates that the changed data is abnormal data.
Fig. 2 schematically illustrates an application scenario diagram of a data processing method according to an embodiment of the present disclosure.
As shown in fig. 2, the application scenario 200 according to this embodiment may include a first terminal device 201, a second terminal device 202, a third terminal device 203, a network 204, and a server 205. The network 204 is a medium used to provide a communication link between the first terminal device 201, the second terminal device 202, the third terminal device 203, and the server 205. The network 204 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 205 through the network 204 using at least one of the first terminal device 201, the second terminal device 202, the third terminal device 203, to receive or send messages, etc. Various communication client applications, such as a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the first terminal device 201, the second terminal device 202, the third terminal device 203.
The first terminal device 201, the second terminal device 202, the third terminal device 203 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 205 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by the user using the first terminal device 201, the second terminal device 202, and the third terminal device 203. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the data processing method provided in the embodiments of the present disclosure may be generally performed by the server 205. Accordingly, the data processing apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 205. The data processing method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 205 and is capable of communicating with the first terminal device 201, the second terminal device 202, the third terminal device 203, and/or the server 205. Accordingly, the data processing apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster, which is different from the server 205 and is capable of communicating with the first terminal device 201, the second terminal device 202, the third terminal device 203 and/or the server 205.
It should be understood that the number of terminal devices, networks and servers in fig. 2 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The data processing method of the disclosed embodiment will be described in detail below with reference to fig. 3 to 5 based on the scenario described in fig. 2.
Fig. 3 schematically illustrates a flow chart of a data processing method according to an embodiment of the present disclosure.
As shown in fig. 3, the data processing method of this embodiment includes operations S310 to S340.
In response to receiving the data change request for the first repository, determining a second repository associated with the first repository based on the version repository identification in the data change request, wherein the data change request further includes changed data, the changed data being changed data in the first repository, the version repository corresponding to the version repository identification being generated based on the data in the first repository and the second repository, in operation S310.
According to an embodiment of the present disclosure, the data change request is generated after data in the first repository is changed, where the data change request includes information such as changed data in the first repository, a version library identifier, and the like.
According to the embodiment of the disclosure, according to the version library identification, the version library corresponding to the changed data can be determined, and meanwhile, other storage libraries which together form the version library with the first storage library can be determined through the version library identification, for example: the number of the second banks is not limited, and may be one or a plurality of second banks.
The manner of determining the first repository from the version repository identification is not limited according to the embodiment of the present disclosure, and the first repository and the second repository that constitute the version repository may be determined from predetermined storage files storing version repository information according to the version repository identification. The predetermined storage file may include a plurality of version library identifiers, and the first storage library identifier and the second storage library identifier forming each version library, so that based on the second storage library identifier, an address of the second storage library may be determined from a plurality of storage library address information included in the predetermined storage file, thereby implementing data obtaining from the second storage library, and the predetermined storage file may be a plurality of or one.
According to an embodiment of the present disclosure, the first repository may be a shared repository or a private repository, and the second repository may be a shared repository or a private repository, where a plurality of data having commonalities are stored, for example: public code, public profiles, etc., the private library may include data developed based on business needs, such as: private code, private profile, etc.
According to the embodiment of the disclosure, the version library identification is obtained by analyzing the data change request, and the second storage library which is generated together with the first storage library can be determined through the version library identification, so that the second storage library associated with the first storage library can be rapidly determined, namely the second storage library which possibly comprises repeated data or conflict data with the changed data is obtained.
In operation S320, a plurality of target data associated with the changed data is determined from the first repository and the second repository based on the changed data.
According to an embodiment of the present disclosure, a plurality of target data may be determined from a plurality of pieces of data included in the first repository and the second repository based on the changed data. Different target data can be obtained based on the different changed data, and different types of target data can be obtained according to the different changed data types.
According to embodiments of the present disclosure, different types of target data may be obtained depending on the changed data category, for example: the changed data may be classified according to the type of the file belonging to the changed data, so as to obtain different types of changed data, different types of target data may exist for the different types of changed data, the type of the file belonging to the changed data may be the type of the file comprising the changed data, and the type of the file belonging to the changed data may be determined from the attribute of the file.
According to embodiments of the present disclosure, target data may be obtained from a first repository and a second repository using a file identification, a file suffix, etc. of a target file including changed data.
According to the embodiment of the disclosure, different target data are determined through different changed data, so that different target data can be determined based on different requirements, diversity and accuracy of the target data are realized, high efficiency and accuracy of detection of the changed data are guaranteed from a data source, meanwhile, the target data are obtained from a first storage library and a second storage library which jointly generate a version library, the comprehensiveness of the data source can be realized, and compared with the mode of obtaining the target data from the version library, the problem that full detection cannot be realized due to historical data only in the version library under the condition that the changed data with the attributive file type is the content merging type can be overcome, and the comprehensiveness of target data acquisition is guaranteed.
In operation S330, the changed data is detected based on the plurality of target data, and a detection result is obtained.
According to embodiments of the present disclosure, different detection manners may be employed for different changed data, e.g., the changed data may be detected differently based on the home file type of the changed data. Specifically, when the home file type of the changed data is the content merge type, the data information of the changed data needs to be detected, and when the home file type of the changed data is the content non-merge type, the file identification of the target file in which the changed data is located may be detected.
The manner of detecting the changed data according to the embodiments of the present disclosure may be various, for example: in the case that the changed data belongs to different attribution file types, similarity calculation can be performed on the data information or the file identification of the changed data through a plurality of target data, and if a similarity value exceeding a threshold value exists, the changed data is considered to be abnormal data. Or the target data may be matched with the data information or the file identifier of the changed data, and when there is the matched target data, the changed data may be regarded as abnormal data.
According to an embodiment of the present disclosure, in a case where the second repository determined based on the version repository identification is empty, that is, in a case where the version repository can be generated based on the first repository, only the first repository may be detected. Thus, upon receiving a data change request, it may be first determined whether a second repository, which generates a version repository in conjunction with the first repository, exists.
According to the embodiment of the disclosure, the detection of the changed data can be realized by respectively calculating the similarity or respectively matching the data information or the file identification of the changed data with the plurality of target data, so that the detection of the changed data before the changed data enters the version library can be realized, the problem that the version library is wrongly reported and needs to be regenerated if the changed data has problems is avoided, or the problem that redundant data exists in the version library is avoided, and the calculation resources and the storage resources are saved to a certain extent.
In operation S340, in the case where the detection result indicates that the changed data is abnormal data, the data change request is responded to reject the changed data from entering the version library.
According to the embodiment of the disclosure, when the detection result indicates that the changed data is abnormal data, the data change request may be responded, the changed data may be alerted by sending a response message, the client sending the data change request may be notified that the changed data is problematic, and the specific location where the changed data is problematic may be determined based on the detection result, for example: the file identification that may include the changed data may have a problem that it is repeated with other file identifications, or the data information of the changed data may be repeated with the data information of other files.
According to an embodiment of the disclosure, in a case that the detection result characterizes the changed data as normal data, the data change request is responded to, so as to agree that the changed data enters the version library.
According to the data processing method, apparatus, device, medium and program product provided by the present disclosure, by receiving a data change request for a first repository, a second repository associated with the first repository is determined based on a version repository identification in the data change request. Wherein the data change request is a request to change data in a version store, the version store being generated based on the data in the first store and the second store. The plurality of target data associated with the changed data can be determined from the version library based on the changed data, and detection of the changed data can be achieved based on the plurality of target data, so that the changed data is refused to enter the version library when the detection result indicates that the changed data is abnormal data. The method has the advantages that the changed data is comprehensively detected by a plurality of target data contained in the first storage library and the second storage library before the changed data enters the version library, so that the technical problems of wasting resources and reducing processing efficiency are at least partially solved, the usability of the version library is guaranteed, and the technical effect of wasting resources caused by reworking the version library is achieved.
According to an embodiment of the present disclosure, the first repository includes a target file, and the data processing method may further include: and determining the type of the attribution file of the changed data based on the attribute information of the changed data, wherein the type of the attribution file is used for representing the type of the target file where the changed data is located.
According to an embodiment of the present disclosure, the first repository may include a target file, and by analyzing attribute information of changed data, a home file type of the changed file may be determined, where the attribute information may include information such as path information where the data is located, a file identifier where the data is located, and the like, for example: the path information where data A is located may be \com\person\db\.
According to the embodiment of the disclosure, the attribution file type of the changed data can be determined through the file attribute of the file identification of the target file, and the attribution file type of the changed data can be obtained from the preset storage file based on the corresponding relation between the file identification and the file type through the file identification.
According to an embodiment of the present disclosure, the home file type may include a content merge type file and a content non-merge type file, for example, if file a included in the file attribute of file a needs to be merged with file B to obtain a new file C, then file a and file B are considered to be content merge type files. If the file attribute of the file a includes that the file a does not need to be combined with other files or if the file attribute of the file a does not record whether the file a needs to be combined with other files, the file a is considered to be a content non-combined file.
Fig. 4 schematically illustrates a flow chart of determining target data according to an embodiment of the disclosure.
As shown in fig. 4, determining the target data includes operations S321 to S322.
In operation S321, a plurality of first target data is determined from the first repository and a plurality of second target data is determined from the second repository based on the home file type of the changed data.
In operation S322, a plurality of target data is determined based on the plurality of first target data and the plurality of second target data.
According to an embodiment of the present disclosure, a plurality of first target data may be determined from a first repository and a plurality of second target data may be determined from a second repository based on a home file type of changed data, and different first and second target data types may be determined for different home file types, wherein the data type of each time the first and second target data are obtained is the same.
According to embodiments of the present disclosure, the target data may be determined in different ways based on different home file types, for example: in the case that the type of the attribution file of the changed data is the content non-merging type, the target data can be determined by identifying the file identification of the target file in which the changed data is located.
According to the embodiment of the disclosure, different target data can be determined through different attribution file types, different processing modes are realized for different situations, so that more accurate determination of the target data can be realized for each mode, the extraction of the whole data is not needed, the target data corresponding to the attribution file types can be extracted, and the data acquisition efficiency is further ensured.
According to an embodiment of the present disclosure, the home file type includes a content merge type, and determining a plurality of first target data from a first repository and a plurality of second target data from a second repository based on the home file type of the changed data may include the following operations.
Under the condition that the attribution file type of the changed data is the content merging type, determining a first file to be merged for carrying out content merging with the target file from a first storage library based on the file identification of the target file where the changed data is located, and determining a second file to be merged for carrying out content merging with the target file from a second storage library; determining a plurality of first target data based on a plurality of data included in the first file to be merged; and determining a plurality of second target data based on the plurality of data included in the second file to be merged.
According to the embodiment of the disclosure, in the case that the attribution file type of the changed data is the content merging type, the file attribute of the target file may be determined based on the file identifier of the target file, then the first to-be-merged file identifier and the second to-be-merged file identifier of the target file for content merging may be determined from the file attributes, or the first to-be-merged file identifier and the second to-be-merged file identifier may be obtained from the reservation storage file, the first to-be-merged file may be found based on the first to-be-merged file identifier, and the second to-be-merged file may be found based on the second to-be-merged file identifier, where the first to-be-merged file may be empty, and the second to-be-merged file may also be empty, but the first to-be-merged file and the second to-be-merged file may not be empty at the same.
According to embodiments of the present disclosure, for example: based on a preset storage file, it can be determined that a file C in the version library is obtained by a target file in the first storage library and a second file to be combined in the second storage library, the data of the changed data is marked as A, the interface information is a, the first file to be combined is empty, and the second target data can be obtained from the second file to be combined: the data is identified as B and the interface of the data is B.
According to an embodiment of the present disclosure, the home file type includes a content non-merging type, and determining a plurality of first target data from a first repository and a plurality of second target data from a second repository based on a data type of the changed data may include the following operations.
Under the condition that the attribution file type of the changed data is the content non-merging type, extracting keywords from the file identification of the target file to obtain target keywords; based on the target key, a plurality of first target file identifications matching the target key are determined from a first repository as a plurality of first target data, and a plurality of second target file identifications matching the target key are determined from a second repository as a plurality of second target data.
According to an embodiment of the present disclosure, in a case where the home file type of the changed data is a content non-merging type, keyword extraction may be performed on a file identifier of the target file, for example: the document identification of the target document may be segmented, and the segmentation rule may be that the segmentation may be based on a prefix and a suffix. Specifically, the method comprises the following steps: if the file identifier is abc.xlm, the prefix may be ABC, the suffix may be XLM, and when the file is extracted from the first repository or the second repository based on the keyword, the file identifier of the file with the suffix of XLM may be extracted as the target data.
According to the embodiment of the disclosure, in the case that the type of the attribution file of the changed data is the content non-merging type, the obtaining range of the target data may be that a rule corresponding to the target file may be determined from a rule configuration table through a file identifier of the target file, if the rule corresponding to the target file indicates that the same file identifier cannot exist in the same folder or the same storage space, the file identifier of the target file may be determined from a predetermined storage file, and thus, the target data may be determined from the first storage repository and the second storage repository based on the other file identifiers.
According to an embodiment of the present disclosure, for the obtaining range of the target data, if the rule corresponding to the target file indicates that the file with the same file name is not allowed to appear under the same directory and sub-directory, other file identifications stored under the same target and sub-directory as the file identification of the target file in the version library may be determined from the predetermined storage file based on the file identification of the target file, so that the target data may be determined from the first storage library and the second storage library based on the other file identifications.
According to the embodiment of the disclosure, the range of obtaining the target data can be narrowed by means of keyword matching, such as suffix matching of file identification, on the basis of determining the rule corresponding to the target file, so that fewer target data can be calculated in the subsequent detection process, and the detection efficiency is improved.
According to an embodiment of the present disclosure, the predetermined storage file may further include an association relationship between files included in the first storage repository and the second storage repository, for example: whether the file A and the file B are stored in the same folder, whether the file A and the file B need to be combined, and the like.
According to an embodiment of the present disclosure, the rule configuration table includes a repository where each file is located and for version library groups, different rule types exist compared with different rules, and the corresponding rule groups can be determined by the file identifier, and if the rule groups are different, the corresponding rule types are also different, for example: rule 1 may be that the same file identity cannot exist in the same folder or the same storage space, and then the rule type of rule 1 may be a folder type. Rule 2 may be a file that is not allowed to have the same file name under the same directory and sub-directory, and rule type of rule 2 may be a file parent path type, and similarly, other rules and rule groups may be set according to different needs. The rule configuration table may be as follows:
Table 1 rule configuration table
According to the embodiment of the present disclosure, the version library groups, the repository identifications, the file paths, the rule groups and the rule types included in table 1 are only illustrative, and the version library groups, the repository, the file identifications, the file paths, the rule groups and the rule types with different names, different numbers and contents can be set according to actual requirements.
According to an embodiment of the present disclosure, detecting changed data based on a plurality of target data to obtain a detection result may include the following operations.
Under the condition that the attribution file type of the changed data is a content merging type, carrying out similarity calculation on the data information of each target data and the data information of the changed data to obtain a first similarity value, wherein the data information comprises a data identifier and interface information; and determining a detection result which characterizes the changed data as abnormal data when a first target similarity value exceeding a target threshold exists in the plurality of first similarity values.
According to the embodiment of the disclosure, in the case that the attribute file type of the changed data is the content merging type, the data information of each target data and the data information of the changed data may be subjected to similarity calculation, so as to obtain a first similarity value, wherein the data identifier and the interface information may be respectively subjected to similarity calculation, and in the case that the data identifier where the target data exists and the data identifier of the changed data are calculated to exceed a similarity threshold value, the changed data may be determined to be abnormal data. And when the interface information of the existing target data and the interface of the changed data are calculated to exceed the similarity threshold value, the changed data can be determined to be abnormal data.
According to an embodiment of the present disclosure, for interface information and a data threshold of changed data, if any one data information has target data exceeding a similarity threshold, the changed data is considered to be abnormal data.
According to the embodiment of the present disclosure, the similarity threshold is not limited, and may be 1, which may be a case where there is data information identical to data information of changed data.
According to the embodiment of the disclosure, by calculating the similarity between the data information of each target data and the data information of the changed data and considering the changed data as the abnormal data when the data information with the similarity higher than the threshold exists, whether the repeated detection exists on the data identification and the interface information between the files needing content combination can be realized, and the problems of incapability of loading the data due to the repeated data identification after combination and redundant configuration due to the repeated interface information are avoided.
According to an embodiment of the present disclosure, detecting changed data based on a plurality of target data to obtain a detection result may include the following operations.
Under the condition that the attribution file type of the changed data is the content non-merging type, carrying out similarity calculation on the file identification of the target file and a plurality of target data to obtain a plurality of second similarity values; and determining a detection result which characterizes the changed data as abnormal data when a second target similarity value exceeding a target threshold exists in the plurality of second similarity values.
According to the embodiment of the disclosure, similarity calculation can be performed based on the file identification of the target file and the plurality of target data, and a similarity value of the file identification of the target file and each target data is determined.
According to the embodiment of the present disclosure, the similarity threshold is not limited, and different similarity thresholds may be set according to actual situations, for example: the similarity threshold may be 1.
According to an embodiment of the present disclosure, the data processing method may further include the following operations.
Identifying a plurality of data identifiers in a shared space based on the target data identifiers of the changed data to obtain an identification result, wherein the target data identifiers of the changed data are the same as the target data identifiers of the data before being changed, the plurality of data identifiers in the shared space correspond to a plurality of target historical data one by one, and the plurality of target historical data are determined from a first storage library and a second storage library based on a historical data change request; determining a plurality of target data associated with the changed data from the first storage library and the second storage library respectively under the condition that the data identification associated with the target data identification does not exist in the identification result representation shared space; a plurality of target data and target data identifications are stored in a shared space.
According to an embodiment of the present disclosure, a shared space may be further provided, in which a plurality of target data associated with changed data is stored, and upon receiving a request for a data change request for the first repository, whether or not there is a target data identification may be identified from the plurality of data identifications of the shared space based on the target data identification of the changed data. The target data identifier may be determined based on a home file type of the changed data, and in the case that the home file type of the changed data is a content merging type, the target data identifier may be a data identifier or an interface identifier, or may be a file identifier of a target file in which the changed data is located.
According to the embodiments of the present disclosure, implementation manners of shared space management are not limited, for example: the distributed cache management data of the network easy object storage service (Netease Object Storage, abbreviated as NOS) can be utilized, the persistent layer management data of the related management table of the database can be utilized, the persistent layer of the related management table of the database can be synchronized while the distributed cache management data of the NOS is utilized, and the persistent layer of the related management table of the database is utilized for management under the condition that the NOS is unavailable.
According to the embodiment of the disclosure, the data identifier associated with the existence of the target data identifier may be an association relationship between the target data identifier and other data identifiers obtained from an association relationship table in the shared space based on the target data identifier. The association relation table stores association relations among a plurality of data identifiers.
According to the embodiment of the present disclosure, in the case where it is determined that there is no target data identification or no association relation of the target data identification in the association relation table of the shared space, it may be considered that there is no target data corresponding to the changed data in the shared space, and thus, multiple pieces of target data may be determined from the first repository and the second repository based on the changed data, and stored into the shared space, and the target data identification stored into the association relation table.
According to the embodiment of the disclosure, when the detection result of the changed data indicates that the changed data is normal data, that is, there is no other data that is repeated with the data identifier, the interface information, the target file identifier, and the like of the changed data, the changed data may be added to the shared space so as to be used when detecting the changed data of the next subsequent data change request.
According to an embodiment of the present disclosure, in a case where a data change request is to delete changed data, it is determined to delete the changed data in a shared space.
According to the embodiment of the disclosure, by setting the shared space, multiple items of target data extracted from the first storage library and the second storage library during each detection of the data change request can be stored, so that the target data does not need to be acquired again in the subsequent detection of the data change request with the same target data identifier, and resource overhead is saved.
Fig. 5 schematically illustrates a schematic diagram of a data processing method according to another embodiment of the present disclosure.
According to an embodiment of the present disclosure, the data included in the first repository 510 and the second repository 520 may generate the version package 530, and in the case where the first repository 510 and the second repository 520 generate the version package or in the case where a data change occurs, the data generating the version package or the changed data may be detected first, and in the case where the detection results in that the data does not have a duplicate, the data is loaded into the version package.
According to embodiments of the present disclosure, a specific process may include the following operations: in response to receiving a data change request for a first repository, determining a second repository 520 associated with the first repository 510 based on a version repository identification in the data change request; it may be determined from the shared space 540 based on the changed data whether there is a data identifier associated with a target data identifier of the changed data, in the case where there is a data identifier associated with the target data identifier, the changed data may be detected using target history data corresponding to the data identifier associated with the target data identifier, and in the case where the detection result indicates that the changed data is abnormal data, the data change request is responded to reject the changed data from entering the version library 530.
In accordance with an embodiment of the present disclosure, in the absence of a data identification associated with a target data identification, a plurality of target data associated with the changed data may be determined from the first repository 510 and the second repository 520 based on the changed data; and detecting the changed data based on the plurality of target data to obtain a detection result.
According to the embodiment of the disclosure, based on the above process, in the case of version library generation or data change, the repeatability detection of the data can be realized before the data enters the version library, and in the case of repository decoupling, the pre-detection of the problem of data conflict or code conflict is realized.
According to the embodiment of the present disclosure, the management of the data in the shared space may also be achieved by establishing a shared space data management table, as shown in table 2 below.
Table 2 shared space data management table
According to the embodiment of the present disclosure, the version library group, the repository identifier, the target data identifier, the rule group, the rule type, the associated data state, and the like included in table 2 are only illustrative, and may be deleted and added according to needs, and the corresponding values thereof may also be predetermined to different values according to actual needs.
According to an embodiment of the present disclosure, detection of changed data may be implemented in another manner based on the above table 2, and may be that after receiving a data change request, a state of a data identifier associated with a target data identifier may be first determined from a shared space data management table, for example: the status of the data identification associated with the target data identification in the first repository is displayed as present, and the first target data determined from the first repository is considered to be present in the shared space. And the status of the data identification associated with the target data identification in the second repository is displayed as absent, indicating that there is no second target data in the shared space determined from the second repository.
According to an embodiment of the present disclosure, in a case where a data identifier associated with a target data identifier does not exist in a second repository, determining whether a data change exists in the second repository and generating a data change request based on the data change, if so, determining a plurality of target data associated with the changed data from the second repository, and storing the plurality of target data into a shared space and updating an association identifier state to exist.
According to the embodiment of the disclosure, in the case that the association identifier states corresponding to the first repository and the second repository are both present, the similarity between the file identifier of the target file and the target data may be calculated based on rule 1, so as to obtain a detection result. Rule 1 is that the same file identity cannot exist in the same folder or the same storage space.
Based on the data processing method, the disclosure also provides a data processing device. The device will be described in detail below in connection with fig. 6.
Fig. 6 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 6, the data processing apparatus 600 of this embodiment includes a second repository determination module 610, a target data determination module 620, a data detection module 630, and a data response module 640.
The second repository determination module 610 is configured to determine, in response to receiving a data change request for the first repository, a second repository associated with the first repository based on a version repository identification in the data change request, wherein the data change request further includes changed data, the changed data being changed data in the first repository, the version repository corresponding to the version repository identification being generated based on the data in the first repository and the second repository. In an embodiment, the second repository determination module 610 may be configured to perform the operation S310 described above, which is not described herein.
The target data determination module 620 is configured to determine a plurality of target data associated with the changed data from the first repository and the second repository based on the changed data. In an embodiment, the target data determining module 620 may be configured to perform the operation S320 described above, which is not described herein.
The data detection module 630 is configured to detect the changed data based on the plurality of target data, and obtain a detection result. In an embodiment, the data detection module 630 may be used to perform the operation S330 described above, which is not described herein.
The data response module 640 is configured to respond to the data change request to reject the changed data from entering the version library if the detection result indicates that the changed data is abnormal data. In an embodiment, the data response module 640 may be used to perform the operation S340 described above, which is not described herein.
According to an embodiment of the present disclosure, the second repository determination module further includes a first determination sub-module.
The first determining sub-module is used for determining the type of the attribution file of the changed data based on the attribute information of the changed data, wherein the type of the attribution file is used for representing the type of the target file where the changed data is located.
According to an embodiment of the present disclosure, the target data determination module includes a second determination sub-module and a third determination sub-module.
A second determining sub-module for determining a plurality of first target data from the first repository and a plurality of second target data from the second repository based on the home file type of the changed data; and a third determination sub-module for determining a plurality of target data based on the plurality of first target data and the plurality of second target data.
According to an embodiment of the present disclosure, the second determination submodule includes a first determination subunit, a second determination subunit, and a third determination subunit.
The first determining subunit is used for determining a first file to be combined for carrying out content combination with the target file from the first storage library and determining a second file to be combined for carrying out content combination with the target file from the second storage library based on the file identification of the target file where the changed data is located under the condition that the attribution file type of the changed data is the content combination type; a second determining subunit, configured to determine a plurality of first target data based on a plurality of data included in the first file to be merged; and the third determining subunit is used for determining a plurality of second target data based on the plurality of data included in the second file to be merged.
According to an embodiment of the present disclosure, the second determination sub-module further comprises an extraction unit and a fourth determination sub-unit.
The extraction unit is used for extracting keywords from the file identification of the target file under the condition that the attribute file type of the changed data is the content non-merging type, so as to obtain the target keywords; and the fourth determining subunit is used for determining a plurality of first target file identifiers matched with the target keywords from the first storage library as a plurality of first target data and determining a plurality of second target file identifiers matched with the target keywords from the second storage library as a plurality of second target data based on the target keywords.
According to an embodiment of the present disclosure, the data detection module includes a first calculation sub-module and a fourth determination sub-module.
The first computing sub-module is used for computing the similarity between the data information of each target data and the data information of the changed data under the condition that the attribution file type of the changed data is the content merging type, so as to obtain a first similarity value, wherein the data information comprises a data identifier and interface information; and the fourth determining submodule is used for determining a detection result for representing that the changed data is abnormal data when a first target similarity value exceeding a target threshold exists in the plurality of first similarity values.
According to an embodiment of the present disclosure, the data detection module further includes a second calculation sub-module and a fifth determination sub-module.
The second calculation submodule is used for carrying out similarity calculation on the file identification of the target file and a plurality of target data under the condition that the type of the attribution file of the changed data is the content non-merging type, so as to obtain a plurality of second similarity values; the fifth determination submodule is used for determining a detection result which characterizes the changed data as abnormal data when a second target similarity value exceeding a target threshold exists in the plurality of second similarity values.
According to an embodiment of the present disclosure, a data processing apparatus further includes an identification module, a determination module, and a storage module.
The identification module is used for identifying a plurality of data identifications in the shared space based on the target data identifications of the changed data to obtain an identification result, wherein the target data identifications of the changed data are identical to the target data identifications of the data before being changed, the plurality of data identifications in the shared space correspond to a plurality of target historical data one by one, and the plurality of target historical data are determined from the first storage library and the second storage library based on the historical data change request; the determining module is used for determining a plurality of target data associated with changed data from the first storage library and the second storage library respectively under the condition that the data identification associated with the target data identification does not exist in the identification result representation shared space; and the storage module is used for storing the plurality of target data and the target data identification in the shared space.
According to an embodiment of the present disclosure, any of the plurality of modules of the second repository determination module 610, the target data determination module 620, the data detection module 630, and the data response module 640 may be combined in one module or any of the plurality of modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the second repository determination module 610, the target data determination module 620, the data detection module 630, and the data response module 640 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the second repository determination module 610, the target data determination module 620, the data detection module 630, and the data response module 640 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.
Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a data processing method according to an embodiment of the disclosure.
As shown in fig. 7, an electronic device 700 according to an embodiment of the present disclosure includes a processor 701 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.
In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. The processor 701 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 702 and/or the RAM 703. Note that the program may be stored in one or more memories other than the ROM 702 and the RAM 703. The processor 701 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 700 may further include an input/output (I/O) interface 705, the input/output (I/O) interface 705 also being connected to the bus 704. The electronic device 700 may also include one or more of the following components connected to an input/output (I/O) interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to an input/output (I/O) interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 702 and/or RAM 703 and/or one or more memories other than ROM 702 and RAM 703 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to implement the item recommendation method provided by embodiments of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (12)

1. A data processing method, comprising:
in response to receiving a data change request for a first repository, determining a second repository associated with the first repository based on a version repository identification in the data change request, wherein the data change request further includes changed data, the changed data being changed data in the first repository, a version repository corresponding to the version repository identification being generated based on data in the first repository and the second repository;
Determining, based on the altered data, a plurality of target data associated with the altered data from the first repository and the second repository;
detecting the changed data based on the plurality of target data to obtain a detection result;
and responding to the data change request to reject the changed data from entering the version library under the condition that the detection result represents that the changed data is abnormal data.
2. The method of claim 1, wherein the first repository comprises a target file, the method further comprising:
and determining the attribution file type of the changed data based on the attribute information of the changed data, wherein the attribution file type is used for representing the type of the target file where the changed data is located.
3. The method of claim 2, wherein determining, based on the altered data, a plurality of target data associated with the altered data from the first repository and the second repository comprises:
determining a plurality of first target data from the first repository and a plurality of second target data from the second repository based on the home file type of the changed data;
The plurality of target data is determined based on the plurality of first target data and the plurality of second target data.
4. The method of claim 3, wherein the home file type comprises a content merge type, the determining a plurality of first target data from the first repository and a plurality of second target data from the second repository based on the home file type of the changed data comprising:
determining a first file to be combined for content combination with a target file from the first storage library based on the file identification of the target file in which the changed data is positioned, and determining a second file to be combined for content combination with the target file from the second storage library under the condition that the attribution file type of the changed data is the content combination type;
determining a plurality of first target data based on a plurality of data included in the first file to be merged;
and determining a plurality of second target data based on the plurality of data included in the second file to be merged.
5. The method of claim 3, wherein the home file type comprises a content unmerged type, the determining a plurality of first target data from the first repository and a plurality of second target data from the second repository based on a data type of the changed data comprising:
Extracting keywords from the file identification of the target file under the condition that the attribution file type of the changed data is the content non-merging type, so as to obtain the target keywords;
based on the target key, a plurality of first target file identifications matching the target key are determined from the first repository as the plurality of first target data, and a plurality of second target file identifications matching the target key are determined from the second repository as the plurality of second target data.
6. The method of claim 2, wherein the detecting the changed data based on the plurality of target data to obtain a detection result includes:
under the condition that the attribution file type of the changed data is a content merging type, carrying out similarity calculation on the data information of each target data and the data information of the changed data to obtain a first similarity value, wherein the data information comprises a data identifier and interface information;
and determining a detection result which characterizes the changed data as abnormal data under the condition that a first target similarity value exceeding a target threshold exists in the plurality of first similarity values.
7. The method of claim 2, wherein the detecting the changed data based on the plurality of target data to obtain a detection result includes:
under the condition that the attribution file type of the changed data is a content non-merging type, carrying out similarity calculation on the file identification of the target file and the plurality of target data to obtain a plurality of second similarity values;
and determining a detection result which characterizes the changed data as abnormal data when a second target similarity value exceeding a target threshold exists in the plurality of second similarity values.
8. The method according to claim 1, comprising:
identifying a plurality of data identifiers in a shared space based on the target data identifiers of the changed data to obtain an identification result, wherein the target data identifiers of the changed data are the same as the target data identifiers of the data before being changed, the plurality of data identifiers in the shared space are in one-to-one correspondence with a plurality of target historical data, and the plurality of target historical data are determined from the first storage library and the second storage library based on a historical data change request;
Determining a plurality of target data associated with the changed data from the first repository and the second repository, respectively, if the recognition result characterizes that the data identification associated with the target data identification does not exist in the shared space;
the plurality of target data and the target data identification are stored in the shared space.
9. A data processing apparatus comprising:
a second repository determination module configured to determine, in response to receiving a data change request for a first repository, a second repository associated with the first repository based on a version repository identification in the data change request, wherein the data change request further includes changed data, the changed data being changed data in the first repository, a version repository corresponding to the version repository identification being generated based on data in the first repository and the second repository;
a target data determination module for determining a plurality of target data associated with the changed data from the first repository and the second repository based on the changed data;
the data detection module is used for detecting the changed data based on the plurality of target data to obtain a detection result;
And the data response module is used for responding to the data change request to reject the changed data from entering the version library under the condition that the detection result represents that the changed data is abnormal data.
10. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.
CN202310943502.7A 2023-07-31 2023-07-31 Data processing method, device, equipment and storage medium Pending CN116975033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310943502.7A CN116975033A (en) 2023-07-31 2023-07-31 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310943502.7A CN116975033A (en) 2023-07-31 2023-07-31 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116975033A true CN116975033A (en) 2023-10-31

Family

ID=88481020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310943502.7A Pending CN116975033A (en) 2023-07-31 2023-07-31 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116975033A (en)

Similar Documents

Publication Publication Date Title
US8949371B1 (en) Time and space efficient method and system for detecting structured data in free text
US11968162B1 (en) Message content cleansing
US10554701B1 (en) Real-time call tracing in a service-oriented system
US20240205020A1 (en) Data Storage Method, Apparatus, and System, Storage Medium, and Program Product
Zhang et al. Incremental graph pattern matching algorithm for big graph data
CN113326064A (en) Method for dividing business logic module, electronic equipment and storage medium
CN110769055B (en) Method, device, medium and electronic equipment for realizing service discovery
CN114070847A (en) Current limiting method, device, equipment and storage medium of server
CN116244751A (en) Data desensitization method, device, electronic equipment, storage medium and program product
CN116975033A (en) Data processing method, device, equipment and storage medium
CN113419887B (en) Method and device for processing online transaction exception of host
US20220292417A1 (en) Using weighted peer groups to selectively trigger a security alert
CN112732471A (en) Error correction method and error correction device for interface return data
CN113762910A (en) Document monitoring method and device
KR101999130B1 (en) System and method of detecting confidential information based on 2-tier for endpoint DLP
CN114938341B (en) Environment detection method and device, electronic equipment and storage medium
CN113094268B (en) Test method, test device, test equipment and test medium
CN117033383A (en) Data detection method, device, equipment and storage medium
CN115190008B (en) Fault processing method, fault processing device, electronic equipment and storage medium
CN116737532A (en) Data processing method, device, equipment and storage medium
CN117077098A (en) Information processing method, apparatus, electronic device and storage medium
CN116561803A (en) Security policy information processing method, device, equipment and storage medium
CN118445209A (en) Software detection method, device, equipment, medium and program product
CN118152963A (en) Transaction abnormality detection method, device, electronic equipment and computer storage medium
CN117914870A (en) Data query method, device, electronic equipment, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination