CN115905637A - Data grouping method and device - Google Patents

Data grouping method and device Download PDF

Info

Publication number
CN115905637A
CN115905637A CN202211578257.6A CN202211578257A CN115905637A CN 115905637 A CN115905637 A CN 115905637A CN 202211578257 A CN202211578257 A CN 202211578257A CN 115905637 A CN115905637 A CN 115905637A
Authority
CN
China
Prior art keywords
data
preset
grouped
keyword
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211578257.6A
Other languages
Chinese (zh)
Inventor
陶予祺
童刚
邵倩倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qichacha Technology Co ltd
Original Assignee
Qichacha Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qichacha Technology Co ltd filed Critical Qichacha Technology Co ltd
Priority to CN202211578257.6A priority Critical patent/CN115905637A/en
Publication of CN115905637A publication Critical patent/CN115905637A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to a data processing method, apparatus, computer device, storage medium, and computer program product. The method comprises the following steps: acquiring first data with similarity greater than a first preset value with data to be grouped from a database, wherein the database comprises a plurality of data groups; under the condition that the preset keywords of the data to be grouped and the corresponding keywords of the first data accord with a first preset condition, acquiring a target data group to which the first data belong; and storing the data to be grouped into the target data group under the condition that the preset keywords and the keywords corresponding to second data in the target data group accord with a second preset condition, wherein the second data is data in the target data group except the first data. By adopting the method, a large amount of data can be efficiently and accurately grouped.

Description

Data grouping method and device
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data grouping method and apparatus, a computer device, and a storage medium.
Background
And the data sorting platform is used for screening and integrating the data and pushing the data to the user. In some application scenarios, under the condition of a large number of data sources, there are a large amount of scattered data, and the data sorting platform needs to acquire data matching with the user requirements from the data and push the data to the user.
In the conventional technology, data can be directly sorted according to the incidence relation among the data per se to obtain a plurality of groups of data with incidence relation. However, the arrangement of large and scattered data by this method is labor-intensive and inefficient.
Disclosure of Invention
In view of the foregoing, it is necessary to provide an efficient and accurate data grouping method, apparatus, computer device, storage medium and computer program product.
In a first aspect, the disclosed embodiments provide a data grouping method. The method comprises the following steps:
acquiring first data with similarity greater than a first preset value with data to be grouped from a database, wherein the database comprises a plurality of data groups;
under the condition that the preset keywords of the data to be grouped and the corresponding keywords of the first data accord with a first preset condition, acquiring a target data group to which the first data belong;
and storing the data to be grouped into the target data group under the condition that the preset keyword and a keyword corresponding to second data in the target data group accord with a second preset condition, wherein the second data is data except the first data in the target data group.
In one embodiment, the obtaining, from the database, first data whose similarity to the data to be grouped is greater than a first preset value includes:
acquiring a plurality of first data with similarity greater than a first preset value with the data to be grouped from a database;
the plurality of first data are arranged in a descending order according to the similarity to obtain a first data group;
and sequentially acquiring first data from the first data group.
In one embodiment, the obtaining, from the database, first data whose similarity to the data to be grouped is greater than a preset value includes:
acquiring preset keywords of data to be grouped and corresponding keywords of data in a database;
and determining the data as first data under the condition that the similarity between the preset keyword and the corresponding keyword is greater than a second preset value.
In one embodiment, the obtaining, from the database, first data whose similarity to the data to be grouped is greater than a preset value includes:
acquiring preset keywords of data to be grouped and keywords corresponding to data in a database;
acquiring the similarity between the data and the data to be grouped under the condition that the corresponding keyword is empty, and determining the data to be first data under the condition that the similarity is greater than a first threshold; and/or the presence of a gas in the atmosphere,
and under the condition that the preset keyword is the same as the corresponding keyword, acquiring the similarity between the data and the data to be grouped, and under the condition that the similarity is greater than a second threshold value, determining that the data is first data.
In one embodiment, the data in the data group is marked with a corresponding data tag, and the obtaining of the target data group to which the first data belongs includes:
determining a target data tag of the first data;
and determining a target data group from the database according to the target data label.
In one embodiment, the obtaining a target data group to which the first data belongs when the preset keyword of the data to be grouped and the corresponding keyword of the first data meet a first preset condition includes:
acquiring a first preset keyword of the data to be grouped and a second preset keyword corresponding to the first data;
under the condition that the first preset keyword and the second preset keyword are the same, acquiring a target data group to which the first data belongs; and/or the presence of a gas in the gas,
and under the condition that the first preset keyword is null and/or the second preset keyword is null, acquiring a target data group to which the first data belongs.
In a second aspect, an embodiment of the present disclosure further provides a data grouping apparatus. The device comprises:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring first data with similarity larger than a first preset value with data to be grouped from a database, and the database comprises a plurality of data groups;
a second obtaining module, configured to obtain a target data group to which the first data belongs when a preset keyword of the to-be-grouped data and a keyword corresponding to the first data meet a first preset condition;
and the grouping module is used for storing the data to be grouped into the target data group under the condition that the preset keywords and the keywords corresponding to second data in the target data group accord with a second preset condition, wherein the second data is data in the target data group except the first data.
In one embodiment, the first obtaining module includes:
the first acquisition submodule is used for acquiring a plurality of first data with similarity larger than a first preset value with the data to be grouped from the database;
the sorting module is used for carrying out descending sorting on the plurality of first data according to the similarity to obtain a first data group;
and the second acquisition submodule is used for sequentially acquiring the first data from the first data group.
In one embodiment, the first obtaining module includes:
the third acquisition submodule is used for acquiring preset keywords of the data to be grouped and corresponding keywords of the data in the database;
and the first determining module is used for determining the data as the first data under the condition that the similarity between the preset keyword and the corresponding keyword is greater than a second preset value.
In one embodiment, the first obtaining module includes:
the fourth obtaining submodule is used for obtaining preset keywords of the data to be grouped and keywords corresponding to the data in the database;
a second determining module, configured to obtain a similarity between the data and the data to be grouped when the corresponding keyword is empty, and determine that the data is the first data when the similarity is greater than a first threshold; and/or the presence of a gas in the atmosphere,
a third determining module, configured to obtain a similarity between the data and the data to be grouped under the condition that the preset keyword is the same as the corresponding keyword, and determine that the data is the first data under the condition that the similarity is greater than a second threshold.
In one embodiment, the data in the data group is marked with a corresponding data tag, and the second obtaining module includes:
a fourth determining module, configured to determine a target data tag of the first data;
and the fifth determining module is used for determining a target data group from the database according to the target data label.
In one embodiment, the second obtaining module includes:
a fifth obtaining submodule, configured to obtain a first preset keyword of the to-be-grouped data and a second preset keyword corresponding to the first data;
a sixth obtaining sub-module, configured to obtain, when the first preset keyword is the same as the second preset keyword, a target data group to which the first data belongs; and/or the presence of a gas in the gas,
and the seventh obtaining submodule is used for obtaining the target data group to which the first data belongs under the condition that the first preset keyword is null and/or the second preset keyword is null.
In a third aspect, an embodiment of the present disclosure further provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method according to any of the embodiments of the present disclosure when executing the computer program.
In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the embodiments of the present disclosure.
In a fifth aspect, the disclosed embodiments also provide a computer program product. The computer program product comprising a computer program that when executed by a processor implements the steps of the method of any of the embodiments of the present disclosure.
According to the data grouping method and device, when data are grouped, first data with similarity larger than a first preset value with the data to be grouped are obtained from a database, the data to be grouped and keywords of the first data are analyzed and judged, the target data group to which the first data belong is obtained under the condition that the preset keywords of the data to be grouped and the keywords corresponding to the first data meet a first preset condition, the data to be grouped are stored in the target data group under the condition that the preset keywords of the data to be grouped and the keywords corresponding to other data in the target data group meet a second preset condition, grouping of the data to be grouped is achieved, the data groups corresponding to the data to be grouped can be determined according to the association relation between the data to be grouped and the database data, efficiency and accuracy of data grouping are improved, data integration and screening are achieved efficiently and accurately, the implementation mode is simple, accurate data pushing can be provided for users, and sensory experience of the users is improved.
Drawings
FIG. 1 is a flow diagram illustrating a method for grouping data in one embodiment;
FIG. 2 is a flow diagram that illustrates a method for grouping data in one embodiment;
FIG. 3 is a flow diagram that illustrates a method for grouping data in accordance with one embodiment;
FIG. 4 is a flow diagram that illustrates a method for grouping data in accordance with one embodiment;
FIG. 5 is a flow diagram that illustrates a method for grouping data in one embodiment;
FIG. 6 is a block diagram showing the structure of a data grouping means in one embodiment;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clearly understood, the embodiments of the present disclosure are described in further detail below with reference to the accompanying drawings and the embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the embodiments of the disclosure and that no limitation to the embodiments of the disclosure is intended.
In one embodiment, as shown in fig. 1, there is provided a method of processing data, the method comprising:
step S110, acquiring first data with similarity greater than a first preset value with data to be grouped from a database, wherein the database comprises a plurality of data groups;
in the embodiment of the disclosure, when grouping data, to-be-grouped data is first acquired, where the to-be-grouped data generally includes one or more keywords. In one example, multiple keywords may be distinguished by way of a field. The method comprises the steps of obtaining first data with the similarity larger than a first preset value with data to be grouped from a database, wherein the database comprises a plurality of data groups, and the data stored in the database comprises grouped data. In an example, the data corresponds to the publishing time, and after the publishing time of the data to be grouped is obtained, the database data in a preset time interval of the publishing time is obtained, where the preset time interval is usually set according to an actual application scenario, for example, the preset time interval may be set as database data half a year before and after the publishing time. When determining the similarity between data, the similarity may be obtained by comparing and determining preset keywords, or may be obtained by comparing and determining the whole data, and in an example, the keywords corresponding to the preset fields may be compared. The first preset value is usually set according to an actual application scenario, and when the similarity between the data in the database and the data to be grouped is greater than the first preset value, the association relationship between the data and the data to be grouped at this time can be considered to be tighter, and further judgment operation can be performed. In general, data in the database and data to be grouped are data of the same category, and therefore the attributes of the corresponding keywords are the same. In one example, the data to be grouped comprises bidding data, and the bidding data can be acquired through issued bidding documents.
Step S120, under the condition that the preset keywords of the data to be grouped and the corresponding keywords of the first data accord with a first preset condition, acquiring a target data group to which the first data belong;
in the embodiment of the present disclosure, a preset keyword of data to be grouped is obtained, in an example, the data to be grouped includes a plurality of keywords, each keyword corresponds to a relevant keyword attribute, and the preset keyword corresponding to the data to be grouped is obtained through the preset keyword attribute. And acquiring keywords corresponding to the keyword attributes of the first data and the preset keywords. And under the condition that the preset keywords of the data to be grouped and the corresponding keywords of the first data accord with a first preset condition, acquiring a target data group to which the first data belong. In an example, the first preset condition may be set to include that the preset keyword is the same as the corresponding keyword, and the similarity between the preset keyword and the corresponding keyword is greater than a preset threshold. In general, when the first preset condition is met, it may be considered that the possibility that the data to be grouped and the first data are the same data group is high, and further determination may be performed, so that further analysis may be performed through other data in the data group to which the first data belongs, and at this time, the target data group to which the first data belongs may be obtained. In one example, the data in the database corresponds to data tags according to different data groups to which the data belong, the data with the same data tag is a data group, and when a target data group is obtained, a data group composed of the data with the same data tag as that of the first data in the database may be determined as the target data group according to the data tag of the first data. In one example, when the data to be grouped corresponds to bid data, the keyword may include, but is not limited to, a project number, a bid company, an agent company, a province code, a project name, and the like.
Step S130, when the preset keyword and a keyword corresponding to second data in the target data group meet a second preset condition, storing the data to be grouped into the target data group, where the second data is data in the target data group except the first data.
In the embodiment of the present disclosure, after the target data group is obtained, data other than the first data in the target data group, that is, the second data, is obtained. The preset keywords of the data to be grouped are compared with the corresponding keywords of the second data, and the data to be grouped are stored into the target data group under the condition that the preset keywords of the data to be grouped and the corresponding keywords of the second data meet a second preset condition, wherein the second preset condition is usually set according to an actual application scene, and in one example, the second preset condition can be set to be the same as the first preset condition. Because the data in the same data group have an incidence relation, the data to be grouped is judged by combining other data in the same data group, so that the final data grouping result is more accurate and reliable. In general, when the second preset condition is met, it may be considered that the data to be grouped at this time belongs to the target data group, and the data to be grouped is stored in the target data group, so that grouping of the data to be grouped is realized. In one example, the data in the database corresponds to data tags according to different data groups, the data with the same data tag is a data group, and when the data to be grouped is stored in the target data group, the data tag corresponding to the first data may be added to the data to be grouped. In one example, when the second preset condition is not met, a new data set can be created, and the data to be grouped is stored in the new data set. In one example, the second data may be one or more, and when the second data is one, the data to be grouped is stored in the target data group under the condition that the keyword corresponding to the second data meets a second preset condition; and when a plurality of second data exist, storing the data to be grouped into the target data group under the condition that all keywords corresponding to the second data meet a second preset condition. In one example, the second data may also be empty, and when the second data is empty, the data to be grouped may be stored in a new data group or in a target data group according to an actual application scenario.
According to the data grouping method and device, when data are grouped, first data with similarity larger than a first preset value with the data to be grouped are obtained from a database, the data to be grouped and keywords of the first data are analyzed and judged, the target data group to which the first data belong is obtained under the condition that the preset keywords of the data to be grouped and the keywords corresponding to the first data meet a first preset condition, the data to be grouped are stored in the target data group under the condition that the preset keywords of the data to be grouped and the keywords corresponding to other data in the target data group meet a second preset condition, grouping of the data to be grouped is achieved, the data to be grouped can be compared and judged with the field value of the grouped data in the database, the data group corresponding to the data to be grouped can be determined according to the incidence relation between the data to be grouped and the database, the efficiency and accuracy of the data grouping are improved, the achieving mode is simple, the data integration and screening are achieved efficiently and accurately, accurate data pushing can be provided for users, and the feeling and experience of the users is improved. The data from different sources can be integrated and screened, and the data group is determined according to the incidence relation among the data. By the embodiment, different types of files issued in different time periods on the same project can be associated and grouped.
In an embodiment, as shown in fig. 2, the acquiring, from the database, first data whose similarity to the data to be grouped is greater than a first preset value includes:
step S111, acquiring a plurality of first data with similarity greater than a first preset value with the data to be grouped from a database;
step S112, arranging the plurality of first data in a descending order according to the similarity to obtain a first data group;
step S113, sequentially acquiring first data from the first data group.
In the embodiment of the disclosure, when the first data is acquired, a plurality of first data with similarity greater than a first preset value with the data to be grouped are acquired from the database. In this embodiment, the plurality of first data are sorted in a descending order according to the size of the similarity, and the obtained plurality of sorted first data are a first data group. And when the data to be grouped and the first data are compared, sequentially acquiring the first data according to the arrangement sequence of the first data group for comparison and judgment. In one example, setting other priority orders, such as the publishing time of the first data, may be implemented for the first data with the same similarity. Generally, the higher the similarity between data, the closer the association between two data can be considered. In one example, when the data to be grouped and the first data are compared and analyzed, the first data can be obtained from the first data group in a traversal mode until the traversal is finished.
According to the data grouping method and device, the first data are sequenced according to the similarity to obtain the first data group, and when the data are compared, the first data are obtained in sequence, so that when the data to be grouped are grouped, the first data with the tight association relation can be compared, the data group to which the data to be grouped belong can be determined as soon as possible, the data grouping accuracy is guaranteed, meanwhile, the data grouping efficiency is improved, and the workload of the data grouping is reduced.
In an embodiment, as shown in fig. 3, the acquiring, from the database, first data whose similarity to the data to be grouped is greater than a preset value includes:
step S114, acquiring preset keywords of the data to be grouped and corresponding keywords of the data in the database;
step S115, determining the data as the first data when the similarity between the preset keyword and the corresponding keyword is greater than a second preset value.
In the embodiment of the present disclosure, when the similarity between the data to be grouped and the first data is compared, the keywords may be compared. The method comprises the steps of obtaining preset keywords of data to be grouped, obtaining keywords corresponding to the data in a database, and determining the data in the database as first data under the condition that the similarity between the preset keywords and the corresponding keywords is larger than a second preset value. The second preset value is usually set according to an actual application scenario, and when the similarity is greater than the second preset value, the association relationship between the two data at this time may be considered to be relatively close, and the two data may belong to the same data group. In an example, when the keyword is obtained, a preset keyword of the to-be-grouped data corresponding to the preset keyword attribute and a keyword corresponding to the data in the database corresponding to the preset keyword attribute may be determined according to the keyword attribute. In one example, the data may include a plurality of fields, each field corresponding to a keyword, and when acquiring the keyword, the keyword of the corresponding field is directly acquired. In one example, the preset keyword may be a data item name of the data to be grouped, the corresponding keyword in the database may be a data item name of the data in the database, and when the similarity between the data item name in the database and the data item name of the data to be grouped is greater than a second preset value, the data is determined to be the first data.
According to the embodiment of the disclosure, the first data in the database is determined by comparing the data keywords, the incidence relation between the obtained first data and the data to be grouped is ensured, the workload of data processing is reduced, the efficiency of obtaining the first data is improved, and then the efficiency of data grouping is improved, so that efficient and accurate data grouping can be realized.
In an embodiment, as shown in fig. 4, the acquiring, from the database, first data whose similarity to the data to be grouped is greater than a preset value includes:
step S116, acquiring preset keywords of the data to be grouped and keywords corresponding to the data in the database;
step S117, acquiring a similarity between the data and the data to be grouped when the corresponding keyword is empty, and determining that the data is first data when the similarity is greater than a first threshold; and/or the presence of a gas in the gas,
step S118, acquiring a similarity between the data and the data to be grouped under the condition that the preset keyword is the same as the corresponding keyword, and determining that the data is the first data under the condition that the similarity is greater than a second threshold.
In the embodiment of the disclosure, when the first data is acquired, a preset keyword of the data to be grouped and a keyword corresponding to the data in the database are acquired, and the preset keyword and the corresponding keyword are analyzed and judged. And when the corresponding preset keyword is empty, acquiring the similarity between the data and the data to be grouped, and determining the data as first data under the condition that the similarity is greater than a first threshold, wherein the first threshold is set according to an actual application scene. In an example, the preset keywords and the corresponding keywords may include keywords corresponding to the item number, and when the keywords corresponding to the item number are empty and the similarity between the data and the data to be grouped reaches a certain threshold, the two data may be considered to have a closer relationship, and the two data are determined as the first data. In one example, when comparing the similarity, the comparison and determination may be performed on the whole piece of data, or may also be performed on some specific keywords in the data, for example, the similarity of the item names may be determined. And when the corresponding preset keyword is not empty, acquiring the similarity between the data and the data to be grouped, and determining that the data is the first data under the condition that the similarity is greater than a second threshold, wherein the second threshold is set according to an actual application scene. In one example, the preset keyword and the corresponding keyword may include a keyword corresponding to an item number, and in the case that the keywords are the same, that is, the data to be grouped is the same as the item number of the data, the two may be considered as similar data, and therefore, the second threshold may be set to a smaller value, for example, the second threshold may be set to zero, that is, in the case that the keywords are the same, the data may be directly determined as the first data. In an example, when the keyword is obtained, a preset keyword of the to-be-grouped data corresponding to the preset keyword attribute and a keyword corresponding to data in the database corresponding to the preset keyword attribute may be determined according to the keyword attribute. In one example, the data may include a plurality of fields, each field corresponding to a keyword, and when acquiring the keyword, the keyword of the corresponding field is directly acquired. In one example, the preset keyword may include a data number, and the similarity is determined when the data number in the database is the same as the data number of the data to be grouped or the data number in the database is empty. In general, when the preset keyword and the corresponding keyword are not empty and different, it may be considered that the data in the database and the data to be grouped do not belong to the same data group.
According to the embodiment of the disclosure, firstly, the preset fields of the data in the data to be grouped and the database are judged, different determination strategies of the first data are adopted according to different keyword comparison results, the accuracy of the first data acquisition is improved, partial data which do not meet the conditions can be directly eliminated according to the preset keywords, the workload of subsequent similarity comparison is reduced, the efficiency of the first data acquisition is improved, the efficiency of data grouping is further improved, and therefore the efficient and accurate data grouping can be realized.
In one embodiment, the data in the data group is marked with a corresponding data tag, and obtaining a target data group to which the first data belongs includes:
determining a target data tag of the first data;
and determining a target data group from the database according to the target data label.
In the embodiment of the disclosure, the data in the data group are all marked with corresponding data tags, and the data tags corresponding to the data in the same data group are the same. When the target data group to which the first data belongs is obtained, the target data group can be determined according to the data tag. And determining a target data label corresponding to the first data, and determining data corresponding to the data label which is the same as the target data label from a database to obtain a target data group. In one example, the data tag may be represented in the form of a data group number, one for the same data group. Typically, the data set number is obtained by marking the data as it is stored in the database. One data group number may correspond to one or more data.
According to the data grouping method and device, the target data group corresponding to the first data is determined through the data tag, the process of obtaining the target data group is simplified, the workload of data processing is reduced, the efficiency of obtaining the target data group is improved, the efficiency of data grouping is improved, and therefore efficient and accurate data grouping can be achieved.
In one embodiment, the obtaining a target data group to which the first data belongs when the preset keyword of the data to be grouped and the corresponding keyword of the first data meet a first preset condition includes:
acquiring a first preset keyword of the data to be grouped and a second preset keyword corresponding to the first data;
under the condition that the first preset keyword and the second preset keyword are the same, acquiring a target data group to which the first data belongs; and/or the presence of a gas in the gas,
and under the condition that the first preset keyword is null and/or the second preset keyword is null, acquiring a target data group to which the first data belongs.
In the embodiment of the disclosure, when the preset keyword of the data to be grouped and the keyword corresponding to the first data are compared, the first preset keyword of the data to be grouped and the second preset keyword corresponding to the first data are obtained, and the first preset keyword and the second preset keyword are compared. In this embodiment, the first preset condition may include that the first preset keyword and the second preset keyword are the same, and when the first preset keyword and the second preset keyword are the same, it may be considered that the association relationship between the data to be grouped and the first data is relatively close at this time, and the data to be grouped and the first data may belong to the same data group; the first preset condition may further include that at least one of the first preset keyword and the second preset keyword is null, and since the keywords of all data do not correspond to information in a part of application scenarios, when at least one of the keywords is null, it may be considered to possibly belong to the same data group, and further judgment and processing are required. In one example, the first preset condition may be a combination or selection of the above. In some implementation manners, a plurality of different preset keywords may be determined, a plurality of preset conditions may be set according to the plurality of different preset keywords, the plurality of preset conditions may be the same or different, and when the plurality of preset conditions are all satisfied, it may be considered that the data to be grouped and the first data may belong to the same data group, and further determination processing is required. And when the first preset condition is met, acquiring a target data group to which the first data belongs.
According to the embodiment of the disclosure, the first preset condition is set to be that the keywords are the same or at least one keyword is null, so that whether the first data and the data to be grouped are possibly the same data group can be accurately judged, the accuracy of acquiring the target data group is ensured, and the accuracy of grouping the data is improved; and the workload of data processing of the subsequent target data group is reduced, the efficiency of acquiring the target data group is improved, and the efficiency of data grouping is further improved, so that efficient and accurate data grouping can be realized.
Fig. 5 is a schematic diagram illustrating a data grouping method according to an exemplary embodiment, and referring to fig. 5, a key of the data may be divided in fields, for example, one data may include a uniqueness field, a publishing time, a commonality field, and a similarity field, where the key corresponding to the uniqueness field may include a number of the data, the key corresponding to the publishing time is the publishing time of the data, and the key corresponding to the commonality field may include specific content of the data, where the specific content may include, but is not limited to, corresponding resource interaction party information, resource interaction place, and the like in the data, and the key corresponding to the similarity field may include an item name of the data. In one example, when the data to be grouped corresponds to bidding data, the uniqueness field may include an item number; the commonality field may include the item number, the tender, the agent, the province code; the similarity field may include an item name. And acquiring data in a preset time interval according to the release time of the data to be grouped. When the data has the uniqueness field, directly acquiring the data with consistent uniqueness field, and sorting the data in descending order according to the similarity of the similarity field to obtain a data group A; when the data does not have the uniqueness field, acquiring the data without the uniqueness field and with the similarity of the similarity field reaching a preset threshold value, and sorting the data in a descending order according to the similarity of the similarity field to obtain a data group B; and summarizing the obtained data group A and the obtained data group B to obtain a similar data group C. Traversing the similar data groups, and checking whether the common fields of the current data and the data in the similar data group meet preset conditions, in this embodiment, the preset conditions include that the common fields do not conflict and the fields with values meet preset values, wherein when the common fields have values at the same time and have equal values or have no values at the same time or one of the common fields has values and the other has no values, it can be considered that the two data do not conflict, the fields with values meeting the preset values can be set to include two data comparisons, and at least n fields meet specific conditions, for example, the common fields have values at the same time and do not conflict. If the conditions are met, acquiring the group number of the data, taking the data with the same group number from the library to obtain a data group F, judging whether all the data under the group number meet the condition that the common field does not conflict with the data to be grouped, if so, finishing the judgment, giving the data to be grouped to the group number for warehousing, and finishing the step; if not, continuing to judge the next piece of data in the data group C until all the data in the data group C are traversed. And if all judgment is finished and the group number corresponding to the data to be grouped is not obtained, giving a new group number to the data to be grouped for warehousing.
According to the embodiment of the invention, after the data to be grouped is obtained, the data similar to the data to be grouped is found in the database (the data in the database is the data with the group), after the data group C is obtained, whether the data conflicts with the data to be grouped is judged one by one, under the condition of no conflict, the corresponding data group F is determined according to the group number corresponding to the data without conflict, if all the data in the data group F does not conflict with the data to be grouped, the data to be grouped belongs to the group, and the data to be grouped is endowed to the group number for storage, so that the grouping of the data is realized efficiently and accurately, the data from different sources can be integrated and screened, the data group is determined according to the incidence relation among the data, the workload is reduced, and the experience of a user is improved. According to the embodiment, the bidding data can be grouped, and different types of files released in different time periods in the same project can be associated and grouped.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least some of the steps in the figures may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or at least partially in sequence with other steps or other steps.
Based on the same inventive concept, the embodiment of the present disclosure further provides a data grouping device for implementing the above-mentioned data grouping method. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the apparatus for grouping data provided below may refer to the limitations in the above method for grouping data, and are not described herein again.
In one embodiment, as shown in fig. 6, there is provided a data grouping apparatus 600, including:
a first obtaining module 610, configured to obtain first data with similarity greater than a first preset value to data to be grouped from a database, where the database includes multiple data groups;
a second obtaining module 620, configured to obtain a target data group to which the first data belongs when a preset keyword of the to-be-grouped data and a keyword corresponding to the first data meet a first preset condition;
a grouping module 630, configured to store the data to be grouped into the target data group when the preset keyword and a keyword corresponding to second data in the target data group meet a second preset condition, where the second data is data in the target data group other than the first data.
In one embodiment, the first obtaining module includes:
the first obtaining submodule is used for obtaining a plurality of first data with similarity greater than a first preset value with the data to be grouped from a database;
the sorting module is used for carrying out descending sorting on the plurality of first data according to the similarity to obtain a first data group;
and the second acquisition submodule is used for sequentially acquiring the first data from the first data group.
In one embodiment, the first obtaining module includes:
the third obtaining submodule is used for obtaining preset keywords of the data to be grouped and corresponding keywords of the data in the database;
and the first determining module is used for determining the data as the first data under the condition that the similarity between the preset keyword and the corresponding keyword is greater than a second preset value.
In one embodiment, the first obtaining module includes:
the fourth obtaining submodule is used for obtaining preset keywords of the data to be grouped and keywords corresponding to the data in the database;
a second determining module, configured to obtain a similarity between the data and the data to be grouped when the corresponding keyword is empty, and determine that the data is the first data when the similarity is greater than a first threshold; and/or the presence of a gas in the gas,
a third determining module, configured to obtain a similarity between the data and the data to be grouped under the condition that the preset keyword is the same as the corresponding keyword, and determine that the data is the first data under the condition that the similarity is greater than a second threshold.
In one embodiment, the data in the data group is marked with a corresponding data tag, and the second obtaining module includes:
a fourth determining module, configured to determine a target data tag of the first data;
and the fifth determining module is used for determining a target data group from the database according to the target data label.
In one embodiment, the second obtaining module includes:
a fifth obtaining submodule, configured to obtain a first preset keyword of the to-be-grouped data and a second preset keyword corresponding to the first data;
a sixth obtaining sub-module, configured to obtain a target data group to which the first data belongs, when the first preset keyword and the second preset keyword are the same; and/or the presence of a gas in the atmosphere,
and the seventh obtaining submodule is used for obtaining the target data group to which the first data belongs under the condition that the first preset keyword is null and/or the second preset keyword is null.
The respective modules in the above-described grouping means of data may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data such as database data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of grouping data.
Those skilled in the art will appreciate that the configuration shown in fig. 7 is a block diagram of only a portion of the configuration associated with embodiments of the present disclosure, and does not constitute a limitation on the computing devices to which embodiments of the present disclosure may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the embodiments of the present disclosure are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided by the embodiments of the disclosure may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in the various embodiments provided by the embodiments of the present disclosure may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided in the disclosure may be general processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., without being limited thereto.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express a few implementations of the embodiments of the present disclosure, and the descriptions thereof are specific and detailed, but not construed as limiting the scope of the claims of the embodiments of the present disclosure. It should be noted that, for those skilled in the art, variations and modifications can be made without departing from the concept of the embodiments of the present disclosure, and these are all within the scope of the embodiments of the present disclosure. Therefore, the protection scope of the embodiments of the present disclosure should be subject to the appended claims.

Claims (10)

1. A method for grouping data, the method comprising:
acquiring first data with similarity greater than a first preset value with data to be grouped from a database, wherein the database comprises a plurality of data groups;
under the condition that the preset keywords of the data to be grouped and the corresponding keywords of the first data meet a first preset condition, acquiring a target data group to which the first data belong;
and storing the data to be grouped into the target data group under the condition that the preset keywords and the keywords corresponding to second data in the target data group accord with a second preset condition, wherein the second data is data in the target data group except the first data.
2. The method of claim 1, wherein the obtaining first data with similarity greater than a first preset value to the data to be grouped from the database comprises:
acquiring a plurality of first data with similarity greater than a first preset value with the data to be grouped from a database;
the plurality of first data are arranged in a descending order according to the similarity to obtain a first data group;
and sequentially acquiring first data from the first data group.
3. The method of claim 1, wherein the obtaining the first data with similarity greater than a preset value with the data to be grouped from the database comprises:
acquiring preset keywords of data to be grouped and corresponding keywords of data in a database;
and determining the data as first data under the condition that the similarity between the preset keyword and the corresponding keyword is greater than a second preset value.
4. The method of claim 1, wherein the obtaining the first data with the similarity greater than a preset value with the data to be grouped from the database comprises:
acquiring preset keywords of data to be grouped and keywords corresponding to data in a database;
acquiring the similarity between the data and the data to be grouped under the condition that the corresponding keyword is empty, and determining the data to be first data under the condition that the similarity is greater than a first threshold; and/or the presence of a gas in the gas,
and under the condition that the preset keyword is the same as the corresponding keyword, acquiring the similarity between the data and the data to be grouped, and under the condition that the similarity is greater than a second threshold value, determining the data to be first data.
5. The method of claim 1, wherein the data in the data group is labeled with a corresponding data tag, and obtaining the target data group to which the first data belongs comprises:
determining a target data tag of the first data;
and determining a target data group from the database according to the target data label.
6. The method according to claim 1, wherein the obtaining a target data group to which the first data belongs when the preset keyword of the data to be grouped and the corresponding keyword of the first data meet a first preset condition comprises:
acquiring a first preset keyword of the data to be grouped and a second preset keyword corresponding to the first data;
under the condition that the first preset keyword and the second preset keyword are the same, acquiring a target data group to which the first data belongs; and/or the presence of a gas in the gas,
and under the condition that the first preset keyword is null and/or the second preset keyword is null, acquiring a target data group to which the first data belongs.
7. An apparatus for packetizing data, the apparatus comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring first data with similarity higher than a first preset value with data to be grouped from a database, and the database comprises a plurality of data groups;
a second obtaining module, configured to obtain a target data group to which the first data belongs when a preset keyword of the to-be-grouped data and a keyword corresponding to the first data meet a first preset condition;
and the grouping module is used for storing the data to be grouped into the target data group under the condition that the preset keywords and the keywords corresponding to second data in the target data group accord with a second preset condition, wherein the second data is data in the target data group except the first data.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of grouping of data according to any of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of grouping of data according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the method for grouping data according to any one of claims 1 to 6.
CN202211578257.6A 2022-12-06 2022-12-06 Data grouping method and device Pending CN115905637A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211578257.6A CN115905637A (en) 2022-12-06 2022-12-06 Data grouping method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211578257.6A CN115905637A (en) 2022-12-06 2022-12-06 Data grouping method and device

Publications (1)

Publication Number Publication Date
CN115905637A true CN115905637A (en) 2023-04-04

Family

ID=86494612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211578257.6A Pending CN115905637A (en) 2022-12-06 2022-12-06 Data grouping method and device

Country Status (1)

Country Link
CN (1) CN115905637A (en)

Similar Documents

Publication Publication Date Title
KR100856771B1 (en) Real time data warehousing
EP3095047B1 (en) Database key identification
US20120303624A1 (en) Dynamic rule reordering for message classification
US20230205755A1 (en) Methods and systems for improved search for data loss prevention
US20220229854A1 (en) Constructing ground truth when classifying data
CN114780606B (en) Big data mining method and system
CN115062016A (en) Incidence relation extraction method and device and computer equipment
CN109656947B (en) Data query method and device, computer equipment and storage medium
CN108804561B (en) Data synchronization method and device
CN116663505B (en) Comment area management method and system based on Internet
CN116561607A (en) Method and device for detecting abnormality of resource interaction data and computer equipment
US11709798B2 (en) Hash suppression
CN115905637A (en) Data grouping method and device
CN112527813A (en) Data processing method and device of business system, electronic equipment and storage medium
CN116578583B (en) Abnormal statement identification method, device, equipment and storage medium
CN115098686A (en) Grading information determination method and device and computer equipment
CN118193753A (en) Data query method, device, computer equipment and storage medium
US20200272852A1 (en) Clustering
CN116028448A (en) Identification code determining method, device, equipment and storage medium of electronic file
US7730052B2 (en) System and method for providing a virtual item context
CN117971841A (en) Data construction method, device, computer equipment and storage medium
CN115269909A (en) Audio classification method, audio search method, computer device and program product
CN115115433A (en) Order data processing method and device, computer equipment and storage medium
CN116860426A (en) Duplicate resource processing method, device, computer equipment and storage medium
CN115879980A (en) Method and device for passenger group circle selection and comparative analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination