CN117370538A - Data processing method and device, equipment and medium - Google Patents

Data processing method and device, equipment and medium Download PDF

Info

Publication number
CN117370538A
CN117370538A CN202311308352.9A CN202311308352A CN117370538A CN 117370538 A CN117370538 A CN 117370538A CN 202311308352 A CN202311308352 A CN 202311308352A CN 117370538 A CN117370538 A CN 117370538A
Authority
CN
China
Prior art keywords
word
candidate
resource transfer
virtual resource
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311308352.9A
Other languages
Chinese (zh)
Inventor
苏文龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311308352.9A priority Critical patent/CN117370538A/en
Publication of CN117370538A publication Critical patent/CN117370538A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a data processing method, a device, equipment and a medium, which can be applied to various scenes such as intelligent traffic, auxiliary driving, cloud technology, artificial intelligence and the like. The method comprises the following steps: if the acquisition request is received, acquiring word extraction logic from a logic library updated before the current moment; extracting candidate words from remark data generated by the virtual resource transfer relation network based on the obtained word extraction logic; if the candidate word is determined to be a target word based on the first heat of the candidate word in the virtual resource transfer relation network, extracting a candidate derivative text which is derived by taking the target word as a root from remark data; generating new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and adding the newly generated word extraction logic into a logic library. According to the technical scheme, the logic library is updated through iteration, so that more comprehensive and accurate target words can be acquired, and the method is suitable for wide scenes.

Description

Data processing method and device, equipment and medium
Technical Field
The present invention relates to the field of computer technology, and in particular, to a data processing method, a data processing apparatus, an electronic device, and a computer readable medium.
Background
It is understood that a virtual resource transfer relationship network is a network structure formed based on transferring virtual resources between a plurality of objects. In the related art, text (such as words, sentences and the like) extraction is performed on related data generated in a virtual resource transfer relation network through a trained model. However, since the model is obtained through a large number of sample training, the model has certain universality, and the context may be required to be combined during extraction, so that the model is not applicable to certain special scenes (such as scenes with high privacy requirements).
Therefore, how to reasonably implement text extraction to be applicable to more scenes is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a data processing method, a device, equipment and a medium, which promote the rationality of text extraction and are suitable for more scenes.
In a first aspect, an embodiment of the present application provides a data processing method, where the method includes: if the acquisition request is received, acquiring word extraction logic from a logic library updated before the current moment; extracting candidate words from remark data generated by the virtual resource transfer relation network based on the obtained word extraction logic; if the candidate word is determined to be a target word based on the first heat of the candidate word in the virtual resource transfer relation network, extracting a candidate derivative text which is derived by taking the target word as a root from the remark data; generating new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and adding the newly generated word extraction logic into the logic library to update the logic library again.
In one embodiment of the present application, based on the foregoing solution, the adding the newly generated word extraction logic to the logic library includes: displaying the generated word extraction logic in a logic auditing interface; and if the auditing operation for the generated word extraction logic auditing pass is received in the logic auditing interface, responding to the auditing operation and adding the generated word extraction logic into the logic library.
In one embodiment of the present application, based on the foregoing scheme, the newly generated word extraction logic is a plurality of; the displaying of the newly generated word extraction logic in the logic audit interface includes: sorting the plurality of newly generated word extraction logics according to the sequence of the heat degree of the candidate derivative texts corresponding to the newly generated word extraction logics from large to small to obtain a sorting sequence; and selecting a preset number of word extraction logics from the ordered sequence, and displaying the selected word extraction logics in a logic auditing interface.
In a second aspect, embodiments of the present application provide a data processing apparatus, the apparatus including: the acquisition module is configured to acquire word extraction logic from a logic library updated before the current moment if an acquisition request is received; a first extraction module configured to extract candidate words from remark data generated by the virtual resource transfer relationship network based on the obtained word extraction logic; the second extraction module is configured to extract candidate derivative text derived by taking the target word as a root from the remark data if the candidate word is determined to be the target word based on the first heat of the candidate word in the virtual resource transfer relation network; and the generation and updating module is configured to generate new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and add the newly generated word extraction logic into the logic library so as to update the logic library again.
In a third aspect, embodiments of the present application provide an electronic device comprising one or more processors; and a memory for storing one or more programs that, when executed by the one or more processors, cause the electronic device to implement the data processing method as described above.
In a fourth aspect, embodiments of the present application provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as described above.
In a fifth aspect, embodiments of the present application provide a computer program product comprising computer instructions which, when executed by a processor, implement a data processing method as described above.
In the technical scheme provided in the embodiment of the application:
and extracting candidate words from remark data generated by the virtual resource transfer relation network based on word extraction logic contained in the logic library updated before the current moment, determining the candidate words as target words based on the first heat of the candidate words in the virtual resource transfer relation network, generating new word extraction logic based on the second heat of the candidate derived text which is derived by taking the target words as the root of words in the virtual resource transfer relation network, and updating the logic library again based on the new word extraction logic, thereby realizing acquisition of the target words and updating of the word extraction logic.
That is, the logic library is updated back based on the collected target words, so that the logic library is updated iteratively, the more comprehensive and more accurate target words can be collected, and the method is suitable for a scene with short text, such as remarking data; meanwhile, a large number of samples do not need to be collected to train a model to collect target words, the collection of the target words does not need to be combined with a context, the collection is simpler and more convenient, related privacy data are not involved, and the method is also suitable for scenes with high privacy requirements, such as virtual resource transfer scenes based on instant messaging application programs and the like.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
FIG. 1 is a schematic diagram of an exemplary implementation environment in which the techniques of embodiments of the present application may be applied.
FIG. 2 is a flow chart illustrating a data processing method according to an exemplary embodiment of the present application.
FIG. 3 is a schematic diagram of candidate boxes shown in an exemplary embodiment of the present application.
Fig. 4 is a flow chart of a data processing method shown in another exemplary embodiment of the present application.
5-1 and 5-2 are schematic diagrams of a target word audit interface shown in an exemplary embodiment of the present application.
Fig. 6 is a flow chart of a data processing method shown in another exemplary embodiment of the present application.
Fig. 7 is a flow chart of a data processing method shown in another exemplary embodiment of the present application.
Figures 8-1 and 8-2 are schematic diagrams of logical audit interfaces shown in an exemplary embodiment of the present application.
Fig. 9 is a flow chart of a data processing method shown in another exemplary embodiment of the present application.
Fig. 10 is a schematic diagram of a data processing method shown in another exemplary embodiment of the present application.
Fig. 11 is a block diagram of a data processing apparatus shown in an exemplary embodiment of the present application.
Fig. 12 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations identical to the present application. Rather, they are merely examples of apparatus and methods that are identical to some aspects of the present application, as detailed in the appended claims.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
In this application, the term "plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
In the related art, text (such as words, sentences and the like) extraction is performed on related data generated in a virtual resource transfer relation network through a trained model. However, since the model is obtained through a large number of sample training, the model has certain universality, and the context may be required to be combined during extraction, so that the model is not applicable to certain special scenes (such as scenes with high privacy requirements).
Therefore, in order to promote the rationality of text extraction, the method and the device are suitable for more scenes, and the data processing scheme is provided. Referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment according to the present application. The implementation environment mainly comprises a terminal device 101 and a server 102, wherein the terminal device 101 and the server 102 communicate through a wired or wireless network.
The terminal device 101 refers to an electronic device that can perform corresponding logical calculation processing. Illustratively, the terminal device 101 includes, but is not limited to, a computer, tablet, notebook, smart phone, and the like.
The server 102 refers to a server that can interact with the terminal device 101, and can also perform corresponding logical calculation processing. The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, which are not limited herein.
It should be noted that the numbers of the terminal devices 101 and the servers 102 in fig. 1 are merely illustrative, and any number of the terminal devices 101 and the servers 102 may be provided according to actual needs.
In one embodiment of the present application, the data processing method may be performed by the server 102.
Illustratively, if the server 102 receives the collection request, then the word extraction logic is obtained from a logic library updated prior to the current time; then extracting candidate words from remark data generated by the virtual resource transfer relation network based on the obtained word extraction logic; then if the candidate word is determined to be a target word based on the first heat of the candidate word in the virtual resource transfer relation network, extracting a candidate derivative text which is derived by taking the target word as a root from remark data; and generating new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and adding the newly generated word extraction logic into the logic library to update the logic library again.
In one embodiment of the present application, the data processing method may be performed by the terminal device 101.
Illustratively, if the terminal device 101 receives the acquisition request, the word extraction logic is obtained from the logic library updated before the current time; then extracting candidate words from remark data generated by the virtual resource transfer relation network based on the obtained word extraction logic; then if the candidate word is determined to be a target word based on the first heat of the candidate word in the virtual resource transfer relation network, extracting a candidate derivative text which is derived by taking the target word as a root from remark data; and generating new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and adding the newly generated word extraction logic into the logic library to update the logic library again.
In an embodiment of the present application, the data processing method may also be interactively performed by the terminal device 101 and the server 102, which is not described herein again.
Through implementing the embodiment, the rationality of text extraction is improved, and the method is applicable to wide scenes.
The technical solution of the embodiment shown in fig. 1 can be applied to various scenes, including but not limited to intelligent traffic, driving assistance, cloud technology, artificial intelligence, etc.; in practical application, the adjustment can be correspondingly performed according to specific application scenes.
For example, if the method is applied to a smart traffic or driving assistance scenario, remark data in the virtual resource transfer relationship network may be generated by virtual resource transfer of communities in the virtual resource transfer relationship network based on a map-class application program, where the map-class application program refers to an application program that is related to a map and implements functions such as positioning and navigation based on the map, and generally the map-class application program has a high requirement on privacy. Accordingly, the terminal device 101 may be an in-vehicle terminal, a navigation terminal, or the like, and the server 102 may be a background computer or the like that interacts with the in-vehicle terminal, the navigation terminal, or the like. Through implementing this application scheme, can draw corresponding text in the wisdom traffic or the driving-assisted scene that privacy requirement is higher, rationality is high.
For example, if the method is applied to cloud technology or artificial intelligence, remark data in the virtual resource transfer relationship network may be generated by performing virtual resource transfer by communities in the virtual resource transfer relationship network based on instant messaging applications, where the messaging applications refer to applications capable of implementing communication functions based on text, voice, video, and the like, and generally the instant messaging applications have high requirements on privacy. Accordingly, the terminal device 101 may be a smart phone or the like, and the server 102 may be a cloud server or the like that interacts with the smart phone or the like. Through implementing this application scheme, can draw corresponding text in cloud technique or artificial intelligence scene that privacy requirement is higher, rationality is high.
It should be noted that, in the specific embodiments of the present application, related data of a user is referred to, when the embodiments of the present application are applied to specific products or technologies, permission or consent of the user needs to be obtained, and collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
Various implementation details of the technical solutions of the embodiments of the present application are set forth in detail below:
Referring to fig. 2, fig. 2 is a flow chart illustrating a data processing method that may be performed by server 102 according to one embodiment of the present application. As shown in fig. 2, the data processing method at least includes S201 to S204, and is described in detail as follows:
s201, if an acquisition request is received, word extraction logic is obtained from a logic library updated before the current moment.
An acquisition request in the embodiments of the present application refers to a request for indicating acquisition of a candidate word, which may be issued by a demander (also referred to as an acquisition demander, for example, a related staff member) who acquires the candidate word. The acquisition request may be issued by the terminal device, and the terminal device may then send the acquisition request to the server, where the server receives the acquisition request sent by the terminal device.
In the embodiment of the application, the logic library refers to a database for storing word extraction logic, wherein the word extraction logic refers to a strategy set/set by extracting candidate words. Illustratively, the number of word extraction logic contained in the logic library is a plurality, wherein the plurality of word extraction logic may be grouped by type, scenario, etc.
For example, please refer to table 1, which is an exemplary logical library grouped by type.
TABLE 1
Referring to table 2, another example logical library grouped by scene is shown.
TABLE 2
It should be noted that, in the embodiment of the present application, the logic library is updated continuously; therefore, after receiving the acquisition request, the server in the embodiment of the application may acquire word extraction logic from the logic library updated before the current moment.
In this embodiment of the present application, the updated logic repository before the current time refers to the logic repository updated at the time before the current time (i.e., earlier than the current time).
For example, let the current time be T1, and the time when the logical library is updated for the first time (i.e., first time) be T0, where T0 is earlier than T1, then the database updated before the current time may be the logical library updated in the time interval [ T0, T1 ].
In one embodiment of the present application, the process of obtaining the word extraction logic from the logic library updated before the current time in S201 may include:
acquiring an update record of a logic library, wherein the update record comprises a plurality of update times;
selecting an update time nearest to the current time from a plurality of update times;
Word extraction logic is obtained from a logic library updated at the selected update time.
In the alternative embodiment, each time the logic library is updated, the logic library is recorded in an update record; optionally, the update record includes, but is not limited to, a time of update to the logical library (i.e., an update time), an identification of the updated word extraction logic (e.g., a sequence number or name, etc.), a number of the updated word extraction logic (i.e., an update number), etc.
Therefore, in an alternative embodiment, the update time closest to the current time may be determined through a plurality of update times contained in the update record of the logic library, and the word extraction logic is obtained from the logic library updated by the update time closest to the current time; that is, the word extraction logic is obtained from a newly updated logic library in an alternative embodiment.
By implementing the alternative embodiment, the word extraction logic is obtained from the latest updated logic library, so that the current latest word extraction logic can be ensured, and the accuracy of extracting the candidate words based on the latest word extraction logic is further improved.
S202, extracting candidate words from remark data generated by the virtual resource transfer relation network based on the obtained word extraction logic.
In the embodiment of the application, the server acquires the word extraction logic, and then can extract the candidate words from remark data generated by the virtual resource transfer relation network based on the acquired word extraction logic.
The candidate words in the embodiment of the application refer to words extracted from remark data generated by a virtual resource transfer relation network based on the acquired word extraction logic, and the words can be words in the normal understanding sense or words, phrases and the like expanded by the words.
Remark data in embodiments of the present application refers to data that explains the transfer of virtual resources, including but not limited to monetary resources, futures, etc., that are typically present/involved in payment operations, transfer operations, redness package operations, etc.
In one embodiment of the present application, the virtual resource transfer relationship network includes a plurality of communities. It can be understood that, since the virtual resource transfer relationship network is a network structure formed by transferring virtual resources among a plurality of objects, wherein each object is taken as a center, a community taking each object as a center can be obtained; i.e., a community includes a central object and associated objects (typically a plurality of associated objects) associated with the central object.
Accordingly, the process of extracting the candidate word from the remark data generated by the virtual resource transfer relationship network based on the obtained word extraction logic in S202 may include:
for each community, obtaining transfer data generated by virtual resource transfer based on an application program in the community, and obtaining data related to resource transfer remarks from the transfer data;
combining data corresponding to the communities and related to the resource transfer remarks to obtain remark data;
candidate words are extracted from the remark data based on the obtained word extraction logic.
That is, in an alternative embodiment, the server obtains transfer data generated by performing virtual resource transfer based on an application program (for example, the instant messaging application program, the map application program, etc. described in the foregoing embodiment) in each community, that is, obtains transfer data corresponding to each of the communities; then, data related to the resource transfer remarks are obtained from transfer data respectively corresponding to the communities, namely, the data related to the resource transfer remarks respectively corresponding to the communities are obtained; and combining data related to the resource transfer remarks corresponding to the communities respectively to obtain remark data, so that candidate words can be extracted from the remark data based on the obtained word extraction logic.
For example, a virtual resource transfer relationship network is represented by C, where c= [ C1', C2' … … Cn ' ], where C1', C2' … … Cn ' represent communities respectively, obtain transfer data generated by virtual resource transfer based on an application in the communities C1', obtain data C1' _b1' related to a remark for resource transfer from the transfer data, obtain transfer data generated by virtual resource transfer based on the application in the communities C2', obtain data C2' _b2' … … related to a remark for resource transfer from the transfer data, and obtain transfer data generated by virtual resource transfer based on the application in the communities Cn ' from the transfer data, and obtain data Cn ' _bn ' related to a remark for resource transfer from the transfer data; and then combining the data C1' b1' corresponding to the community C1' and related to the resource transfer remarks, the data C2' b2' … … corresponding to the community C2' and the data Cn ' bn ' corresponding to the community Cn ' and related to the resource transfer remarks to obtain remark data, and extracting candidate words from the remark data based on word extraction logic.
By implementing the alternative embodiment, remark data generated by the virtual resource transfer relation network can be simply, conveniently and quickly obtained, so that powerful support is provided for extracting candidate words.
And S203, if the candidate word is determined to be the target word based on the first heat of the candidate word in the virtual resource transfer relation network, extracting the candidate derivative text which is derived by taking the target word as the root of the word from the remark data.
In the embodiment of the application, the server extracts the candidate word, then calculates the first heat of the candidate word in the virtual resource transfer relation network, and determines whether the candidate word can be used as the target word based on the first heat of the candidate word in the virtual resource transfer relation network.
In the embodiment of the application, the first heat degree of the candidate word in the virtual resource transfer relation network refers to a parameter for representing/reflecting how frequently the candidate word is used in the virtual resource transfer relation network.
In the embodiment of the application, the target word refers to a word required by the acquisition requirement party, and correspondingly, the candidate word is a word extracted based on the word extraction logic, and the candidate word is not necessarily a word required by the acquisition requirement party, so whether the candidate word can be used as the target word or not needs to be determined based on the first heat of the candidate word in the virtual resource transfer relation network. It will be appreciated that the target word may have some sense of use, but the sense of use is not necessarily meant to be a realistic sense, e.g. a low security group internally prescribes a word as an internal communication identifier, and the word may not have any realistic sense for concealment.
In the embodiment of the application, whether the candidate word can be used as the target word is determined based on the first heat of the candidate word in the virtual resource transfer relation network, which comprises the following two cases:
in case 1, determining that the candidate word can be used as the target word based on the first heat of the candidate word in the virtual resource transfer relation network, and extracting the candidate derivative text derived by taking the target word as the root from the remark data at the moment.
The candidate derivative text refers to a text (e.g., a word, a sentence, etc.) derived from the target word as the root. The deriving includes, but is not limited to, adding a prefix and adding a suffix, that is, the derived text may be text obtained by adding a prefix with a target word as a root (denoted as an xxx target word), text obtained by adding a suffix with a target word as a root (denoted as a target word xxx), or text obtained by adding a prefix and a suffix with a target word as a root (denoted as an xxx target word xxx).
For example, please refer to table 3, which is an exemplary candidate derivative text.
TABLE 3 Table 3
It may be understood that in S202, the candidate word is extracted from the remark data generated by the virtual resource transfer relationship network based on the obtained word extraction logic, in S203, after the candidate word is determined to be the target word, the candidate derivative text corresponding to the candidate word is extracted from the remark data generated by the virtual resource transfer relationship network, specifically, the candidate derivative text corresponding to the candidate word may be extracted through text recognition, and in this case, the extraction based on the word extraction logic is not needed.
For example, the candidate word "Li Guoqing" is extracted based on the word extraction logic, and is determined to be the target word based on the first heat of the candidate word "Li Guoqing" in the virtual resource transfer relationship network, and then the candidate derivative text corresponding to the target word "Li Guoqing" is extracted based on the target word "Li Guoqing". Illustratively, the candidate derivative text corresponding to the extracted target word "Li Guoqing" is "thank Li Guoqing", "Li Guoqing is partnership welfare", "thank Li Guoqing is partnership welfare", "Li Guoqing is excellent" and "Li Guoqing is excellent" and so on.
And 2, determining that the candidate word cannot be used as the target word based on the first heat of the candidate word in the virtual resource transfer relation network, wherein the candidate word is not processed at the moment, or acquisition failure information is generated, and the acquisition failure information is returned to an acquisition demand party, or the acquisition demand party returns to S201 to execute again, and the like.
S204, generating new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and adding the newly generated word extraction logic into the logic library to update the logic library again.
In the embodiment of the application, the server extracts the candidate derivative text which is derived by taking the target word as the root, then the second heat degree of the candidate derivative text in the virtual resource transfer relation network can be calculated, whether new word extraction logic can be generated based on the candidate derivative text or not is determined based on the second heat degree of the candidate derivative text in the virtual resource transfer relation network, and the logic library is updated again based on the new word extraction logic.
In the embodiment of the application, the second heat degree of the candidate derivative text in the virtual resource transfer relation network refers to a parameter for representing/reflecting how frequently the candidate derivative text is used in the virtual resource transfer relation network.
In the embodiment of the application, based on the second heat of the candidate derivative text in the virtual resource transfer relation network, whether new word extraction logic can be generated based on the candidate derivative text is determined, which comprises the following two cases:
in case 1, determining that new word extraction logic can be generated based on the candidate derivative text based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and generating new word extraction logic based on the candidate derivative text at this time is enough.
The meaning of the new word extraction logic is the same as that described in the previous embodiment, except that the new word extraction logic is newly generated based on the candidate derivative text obtained this time, so as to be used for extracting the candidate word in the subsequent stage.
And 2, determining that new word extraction logic cannot be generated based on the candidate derivative text based on the second heat of the candidate derivative text in the virtual resource transfer relation network, wherein the new word extraction logic cannot be processed at the moment, or acquisition failure information is generated, and the acquisition failure information is returned to an acquisition demand party, or the new word extraction logic is re-executed in S201.
For example, in connection with the foregoing examples, candidate derivative texts are "thank Li Guoqing", "Li Guoqing is a partnership welfare", "thank Li Guoqing is a partnership welfare", "Li Guoqing is truly excellent", and "Li Guoqing true bar", where the second heat of "thank Li Guoqing" in the virtual resource transfer relationship network, "Li Guoqing is a partnership welfare" in the virtual resource transfer relationship network, "thank Li Guoqing is a partnership welfare" in the virtual resource transfer relationship network, "Li Guoqing is truly excellent", and "Li Guoqing true bar" in the virtual resource transfer relationship network "are calculated, respectively; new word extraction logic is then generated based on the second heat in the virtual resource transfer relationship network of "thank Li Guoqing," Li Guoqing is partner for welfare, "the second heat in the virtual resource transfer relationship network of" thank Li Guoqing is partner for welfare, "the second heat in the virtual resource transfer relationship network of" Li Guoqing is truly excellent, "and the second heat in the virtual resource transfer relationship network of" Li Guoqing true bar. And meanwhile, the newly generated word extraction logic is set as word extraction logic a11 'and word extraction logic a13', and then the word extraction logic a11 'and the word extraction logic a13' are added into a logic library (for example, the logic library shown in table 1 or table 2) so as to update the logic library again.
According to the embodiment of the application, the logic library is updated through iteration, so that more comprehensive and more accurate target words can be acquired, and the method is suitable for a scene with short text, such as remarking data; meanwhile, a large number of samples are not required to be collected to train a model to collect target words, the collection of the target words is not required to be combined with a context, the collection is simpler and more convenient, related privacy data are not involved, and the method is also suitable for scenes with higher requirements on privacy.
In one embodiment of the present application, another data processing method is provided, which may be performed by the server 102. As shown in fig. 3, the data processing method may further include S301 to S302 after S202.
In the embodiment of the application, the virtual resource transfer relationship network includes a plurality of communities. It can be understood that, since the virtual resource transfer relationship network is a network structure formed by transferring virtual resources among a plurality of objects, wherein each object is taken as a center, a community taking each object as a center can be obtained; i.e., a community includes a central object and associated objects (typically a plurality of associated objects) associated with the central object.
S301 to S302 are described in detail as follows:
s301, counting the total number of occurrence times of candidate words in a plurality of communities within a preset time period.
In the embodiment of the present application, the server extracts the candidate words, and then may count the total number of occurrences of the candidate words in a plurality of communities within a preset period of time.
For example, see table 4, for an example, the total number of sums is obtained.
TABLE 4 Table 4
As shown in Table 4, in an alternative embodiment, 1 is used to characterize that candidate words occur in the community for a predetermined period of time, and 0 is used to characterize that candidate words do not occur in the community for the predetermined period of time. In practical application, the method can be flexibly adjusted according to specific application scenes.
In one embodiment of the present application, the process of counting the total number of occurrences of the candidate word in the plurality of communities in the preset time period in S301 may include:
selecting a community in which the candidate word is used by the center object from a plurality of communities;
counting the number of associated objects using the candidate words in a preset time period for each selected community;
if the counted number is greater than a first preset number threshold, the total number of times that the candidate word appears in the communities is counted is increased.
It may be appreciated that, since there may be a situation in which the candidate word is not used by the center object, in order to facilitate statistics, in an alternative embodiment, the server may first select a community in which the candidate word is used by the center object from the communities, then, for each selected community, count the number of associated objects in which the candidate word is used within a preset time period, and determine whether to increase the total number of times for counting the occurrence of the candidate word in the communities based on the relationship between the counted number and the first preset number threshold.
For example, taking the community C1 'as an example, the community C1' includes a center object C11, and 4 associated objects C12-C15 associated with the center object C11, wherein the center object C11 uses the candidate word "Li Guoqing", and the number of candidate words "Li Guoqing" used in the 4 associated objects C12-C15 is counted.
Wherein, in the optional embodiment, based on the relationship between the counted number and the first preset number threshold, determining whether to increase the total number of times of occurrence of the candidate word in the communities, including the following two cases:
in case 1, if the counted number is greater than the first preset number threshold, the total number of times for counting the occurrence of the candidate word in the plurality of communities may be increased at this time.
For example, in the foregoing example, if the candidate word "Li Guoqing" is used by all of the 4 association objects c12-c15, the counted number is 4, and the first preset number threshold is 3, then the total number of occurrences of the candidate word in the communities may be increased.
In case 2, if the counted number is less than or equal to the first preset number threshold, no processing may be performed at this time, that is, the total number of times for counting occurrence of the candidate word in the plurality of communities is not increased.
For example, in the foregoing example, if only the association object c12 of the 4 association objects c12-c15 uses the candidate word "Li Guoqing", the counted number is 1, and the first preset number threshold is 3, then the total number of times of occurrence of the candidate word in the communities need not be increased.
By implementing the alternative embodiment, the total times can be simply, conveniently and quickly obtained by counting the number of the associated objects of the candidate words used in the preset time period, so that powerful support is provided for calculating the first heat of the candidate words.
In one embodiment of the present application, the process of counting the total number of occurrences of the candidate word in the plurality of communities in the preset time period in S301 may include:
selecting a community in which the candidate word is used by the center object from a plurality of communities;
counting the proportion of the association objects using the candidate words in a preset time period for each selected community;
if the counted proportion is larger than a first preset proportion threshold value, the total number of times of occurrence of the candidate words in a plurality of communities is increased.
It may be appreciated that, since there may be a situation in which the candidate word is not used by the center object, in order to facilitate statistics, in an alternative embodiment, the server may first select a community in which the candidate word is used by the center object from the communities, then, for each selected community, count the proportion of the associated objects in which the candidate word is used in the preset time period, and determine whether to increase the total number of times the candidate word appears in the communities based on the relationship between the counted proportion and the first preset proportion threshold.
For example, taking the community C1 'as an example, the community C1' includes a center object C11 and 4 associated objects C12-C15 associated with the center object C11, wherein the center object C11 uses the candidate word "Li Guoqing", and the proportion of the 4 associated objects C12-C15 that use the candidate word "Li Guoqing" is counted.
In an alternative embodiment, based on a relationship between the ratio obtained by statistics and a first preset ratio threshold, determining whether to increase a total number of times the candidate word appears in the communities, where the total number of times the candidate word appears in the communities includes the following two cases:
in case 1, if the counted proportion is greater than the first preset proportion threshold, the total number of times for counting the occurrence of the candidate word in the plurality of communities may be increased at this time.
For example, in the foregoing example, if the candidate word "Li Guoqing" is used by all of the 4 association objects c12-c15, the ratio obtained by statistics is 100%, and the first preset ratio threshold value is 90%, then the total number of times of occurrence of the candidate word in a plurality of communities may be increased.
In case 2, if the counted proportion is less than or equal to the first preset proportion threshold, no processing may be performed at this time, that is, the total number of times for counting occurrence of the candidate word in the plurality of communities is not increased.
For example, in the foregoing example, if only the association object c12 of the 4 association objects c12-c15 uses the candidate word "Li Guoqing", the statistical proportion is 25%, and the first preset number threshold is 90%, then the total number of times of occurrence of the candidate word in the communities need not be increased.
By implementing the alternative embodiment, the total times can be simply, conveniently and accurately obtained by means of proportion statistics of the associated objects of the candidate words used in the preset time period, so that powerful support is provided for calculation of the first heat of the candidate words.
In an alternative embodiment, the increasing the total number of times the candidate word appears in the multiple communities may be adding 1 to the number of times the candidate word appears in each community (i.e. 1 unit amount), where in practical application, the increasing unit amount may be flexibly set according to a specific application scenario.
S302, taking the counted total times as the first heat of the candidate words in the virtual resource transfer relation network, or taking the ratio of the counted total times to a preset time period as the first heat of the candidate words in the virtual resource transfer relation network.
After the total times are obtained in the embodiment of the application, the total times can be directly used as the first heat of the candidate word in the virtual resource transfer relation network, or the ratio of the total times to the preset time period can be used as the first heat of the candidate word in the virtual resource transfer relation network.
It should be noted that, the detailed description of S201 to S204 shown in fig. 3 is please refer to S201 to S204 shown in fig. 2, and the detailed description is omitted here.
In the embodiment of the application, the total times are obtained through calculation, the ratio of the total times or the total times to the preset time period is used as the first heat of the candidate word in the virtual resource transfer relation network, and the overall process of obtaining the first heat of the candidate word in the virtual resource transfer relation network is simple and close to the actual use condition.
In one embodiment of the present application, another data processing method is provided, which may be performed by the server 102. As shown in fig. 4, the data processing method S202 may further include S401 to S403.
S401 to S403 are described in detail as follows:
s401, detecting the magnitude relation between the first heat and a first preset heat threshold value of the candidate words in the virtual resource transfer relation network.
In the embodiment of the application, the server calculates the first heat of the candidate word in the virtual resource transfer relation network, and then can detect the magnitude relation between the first heat of the candidate word in the virtual resource transfer relation network and the first preset heat threshold.
In the embodiment of the application, the size relation between the first heat and the first preset heat threshold value of the candidate word in the virtual resource transfer relation network is detected, which comprises the following two cases:
And 1, detecting that the first heat of the candidate word in the virtual resource transfer relation network is greater than a first preset heat threshold.
And 2, detecting that the first heat of the candidate word in the virtual resource transfer relation network is smaller than or equal to a first preset heat threshold.
And S402, if the first heat of the candidate word in the virtual resource transfer relation network is greater than a first preset heat threshold, determining the candidate word as a target word.
In case 1, if the first heat of the candidate word in the virtual resource transfer relation network is detected to be greater than the first preset heat threshold, the usage frequency of the candidate word in the virtual resource transfer relation network is high, so that the candidate word can be determined to be a target word.
In case 2, if the first heat degree of the candidate word in the virtual resource transfer relation network is detected to be less than or equal to the first preset heat degree threshold, the usage frequency of the candidate word in the virtual resource transfer relation network is low, so that the candidate word can be determined as the target word without processing.
S403, adding the target word into the word stock to update the word stock.
In the embodiment of the application, the server determines the candidate word as the target word, and then the target word can be added into the word stock, so that the word stock is updated.
In one embodiment of the present application, the process of adding the target word to the word stock in S403 may include:
displaying the target word in a target word auditing interface;
if the auditing operation passing the auditing of the target word is received in the target word auditing interface, the target word is added into the word stock in response to the auditing operation.
That is, in an alternative embodiment, the server may display the target word in a target word auditing interface, so that an auditor (e.g., a related staff member) may audit the target word based on the target word displayed in the target word auditing interface, and accordingly, add the target word to the thesaurus in response to an audit operation passed by the audit.
For example, referring to FIG. 5-1, an exemplary target word review interface is shown. 5-1, the auditor audits the target word through the target word displayed in the target word audit interface, wherein the auditor can trigger the "allow joining" component to issue audit operations passing audit or trigger the "reject joining" component to issue audit operations failing audit, alternatively, the triggering of the component can be realized by clicking, sliding and the like.
By implementing the alternative embodiment, the auditing party is used for auditing the target words, so that the accuracy of the target words is ensured, more accurate target words can be extracted based on the accurate target words, and the phenomenon of resource waste caused by ineffective extraction of the target words is avoided to a certain extent.
In one embodiment of the present application, the target word may be plural. That is, in the alternative embodiment, a plurality of candidate words are extracted simultaneously, and accordingly, a plurality of candidate words in the plurality of candidate words are respectively determined as target words, where the target words are a plurality of.
Accordingly, the process of displaying the target word in the target word audit interface may include:
sequencing a plurality of target words according to the sequence of the heat degree of the target words from big to small to obtain a sequencing sequence;
and selecting a preset number of target words from the ordered sequence, and displaying the selected target words in a target word auditing interface.
That is, in the case that there are multiple target words in the alternative embodiment, the multiple target words may be ranked according to the order of the heat degree of the target words (i.e., the first heat degree of the candidate words corresponding to the target words) from high to low, to obtain a ranking sequence, then the pre-set number of target words are selected from the ranking sequence, and the selected pre-set number of target words are displayed in the target word auditing interface for the auditor to audit.
For example, see table 5 for an exemplary ordered sequence.
TABLE 5
Wherein, heat 1> heat 2> heat 3> heat 4> heat 5 in table 5.
For example, referring to FIGS. 5-2, another exemplary target word review interface is shown. 5-2, a plurality of target words are displayed in the target word auditing interface, wherein each target word corresponds to an 'allowed joining' component and a 'refused joining' component, and the auditing party can send auditing operation according to actual conditions.
By way of example, batch operation components, such as a batch permission joining component and a batch rejection joining component, can be displayed in the target word auditing interface, and batch auditing operation can be realized by triggering the batch operation component, so that auditing time of an auditor is saved.
The target word auditing interface can also display an auditing remark component, and corresponding text input (text, voice and the like) can be realized by triggering the auditing remark component, so that the auditing process of the target word can be correspondingly recorded or marked, and the later traceability management and the like are facilitated.
By implementing the alternative embodiment, the target words are subjected to sorting and screening treatment, so that auditing pressure is relieved for an auditing party, and auditing efficiency is higher.
In one embodiment of the present application, after the process of adding the target word to the word stock in S403, it may further include:
If the management request is received, carrying out security level detection on the party to be detected based on the target words contained in the word stock to obtain a detection result;
and performing management operation on the party to be detected based on the detection result.
That is, in an alternative embodiment, the server adds the target word to the word stock, and then if the management request is received, the security level detection may be performed on the party to be detected based on the target word contained in the word stock, to obtain a detection result, and the management operation may be performed on the party to be detected based on the detection result.
In this alternative embodiment, the management request may be issued by a party that manages the party to be detected (also referred to as a management party, e.g., an associated worker). The management requesting party may issue a management request through the terminal device, and then the terminal device sends the management request to the server, and accordingly, the server receives the management request sent by the terminal device.
The target words contained in the word stock in the alternative embodiment may correspond to the security level, and further the security level of the to-be-detected party may be detected through the target words used by the to-be-detected party, so as to obtain a detection result.
For example, see table 6 for an example security level for a target word.
Target word Target word 1 Target word 2 Target word 3 Target word 5
Security level Security level 1 Security level 4 Security level 2 Security level 3
TABLE 6
In table 6, security level 1> security level 2> security level 3> security level 4, i.e., security level 1 is highest and security level 4 is lowest.
For example, in the table 6, if the probability that the target word 2 is used by the party to be detected is maximum, the security level to which the party to be detected belongs is security level 4, that is, the security level of the party to be detected is low, further detection (for example, detection of the party to be detected by using more relevant data) or control measures (for example, finding other parties to be detected which have a connection with the party to be detected) can be performed, so that the security of relevant virtual resources is ensured, and the like.
By implementing the alternative embodiment, the detection of the party to be detected which needs to be safely controlled is performed based on the target words contained in the word stock, the detection flow is simple, and the detection accuracy is high.
It should be noted that, the detailed description of S201 to S204 shown in fig. 4 is please refer to S201 to S204 shown in fig. 2, and the detailed description is omitted here.
According to the method and the device for determining the candidate word, whether the candidate word is the target word or not can be simply, conveniently and accurately determined through comparison between the first heat of the candidate word in the virtual resource transfer relation network and the first preset heat threshold, and the candidate word serving as the target word is added into a word stock, so that target word collection is achieved, and convenience and accuracy of target word collection are improved.
In one embodiment of the present application, another data processing method is provided, which may be performed by the server 102. As shown in fig. 6, the data processing method may further include S601 to S602 after S203.
In the embodiment of the application, the virtual resource transfer relationship network includes a plurality of communities. It can be understood that, since the virtual resource transfer relationship network is a network structure formed by transferring virtual resources among a plurality of objects, wherein each object is taken as a center, a community taking each object as a center can be obtained; i.e., a community includes a central object and associated objects (typically a plurality of associated objects) associated with the central object.
S601 to S602 are described in detail as follows:
s601, counting the total number of times that the candidate derivative text appears in a plurality of communities within a preset time period.
In the embodiment of the present application, the server extracts the candidate derived text, and then may count the total number of occurrences of the candidate derived text in a plurality of communities within a preset period of time.
For example, referring back to table 4, as shown in table 4, in an alternative embodiment, 1 indicates that the candidate derivative text appears in the community for a predetermined period of time, and 0 indicates that the candidate derivative text does not appear in the community for the predetermined period of time. In practical application, the method can be flexibly adjusted according to specific application scenes.
In one embodiment of the present application, the process of counting the total number of occurrences of the candidate derivative text in the plurality of communities in the preset time period in S601 may include:
selecting communities from the communities that the center object uses the candidate derivative text;
counting the number of associated objects using the candidate derived text in a preset time period for each selected community;
if the counted number is greater than a second preset number threshold, the total number of times of occurrence of the candidate derived text in a plurality of communities is increased.
It may be appreciated that, since there may be a situation where the center object does not use the candidate derivative text, in order to facilitate statistics, in an alternative embodiment, the server may first select communities where the center object uses the candidate derivative text from the communities, then, for each selected community, count the number of associated objects where the candidate derivative text is used in a preset time period, and determine whether to increase the total number of occurrences of the candidate derivative text in the communities based on the relationship between the counted number and the second preset number threshold.
For example, taking the community C1 'as an example, the community C1' includes a center object C11 and 4 associated objects C12-C15 associated with the center object C11, wherein the center object C11 uses candidate derivative text thank Li Guoqing ", and counts the number of candidate derivative text thank Li Guoqing used in the 4 associated objects C12-C15.
In an alternative embodiment, based on a relationship between the counted number and a second preset number threshold, determining whether to increase a total number of times the candidate derivative text appears in the communities, where the total number of times the candidate derivative text appears in the communities is counted, includes the following two cases:
in case 1, if the counted number is greater than the second preset number threshold, the total number of times for counting the occurrence of the candidate derivative text in the plurality of communities may be increased at this time.
For example, in the foregoing example, if the candidate derivative text "thank you Li Guoqing" is used by all of the 4 associated objects c12-c15, the counted number is 4, and the second preset number threshold is 3, then the total number of times of occurrence of the candidate derivative text in the communities may be increased.
In case 2, if the counted number is less than or equal to the second preset number threshold, no processing may be performed at this time, that is, the total number of times for counting the occurrence of the candidate derivative text in the plurality of communities is not increased.
For example, in the foregoing example, if only the related object c12 of the 4 related objects c12-c15 uses the candidate derivative text "thank you Li Guoqing", the counted number is 1, and the second preset number threshold is 3, then the total number of times of occurrence of the candidate derivative text in the communities does not need to be increased.
By implementing the alternative embodiment, the total times can be simply and quickly obtained by counting the number of the associated objects of the candidate derivative text used in the preset time period, so that powerful support is provided for calculating the second heat of the candidate derivative text.
In one embodiment of the present application, the process of counting the total number of occurrences of the candidate derivative text in the plurality of communities in the preset time period in S601 may include:
selecting communities from the communities that the center object uses the candidate derivative text;
counting the proportion of the associated objects using the candidate derived text in a preset time period for each selected community;
if the counted proportion is larger than a second preset proportion threshold value, the total number of times of occurrence of the candidate derived texts in a plurality of communities is increased.
It may be appreciated that, since there may be a situation where the center object does not use the candidate derivative text in practice, in order to facilitate statistics, in an alternative embodiment, the server may first select communities in which the center object uses the candidate derivative text from the communities, then, for each selected community, count the proportion of associated objects in which the candidate derivative text is used in a preset time period, and determine whether to increase the total number of occurrences of the candidate derivative text in the communities based on the relationship between the counted proportion and the second preset proportion threshold.
For example, taking the community C1 'as an example, the community C1' includes a center object C11 and 4 associated objects C12-C15 associated with the center object C11, wherein the center object C11 uses the candidate derivative text "thank Li Guoqing", and the proportion of the candidate derivative text "thank Li Guoqing" used in the 4 associated objects C12-C15 is counted.
In an alternative embodiment, based on a relationship between the ratio obtained by statistics and a second preset ratio threshold, determining whether to increase a total number of times of occurrence of the candidate derivative text in the communities, including the following two cases:
in case 1, if the counted proportion is greater than the second preset proportion threshold, the total number of occurrences of the candidate derivative text in the communities may be increased at this time.
For example, in the foregoing example, if the candidate derivative text "thank you Li Guoqing" is used by the 4 associated objects c12-c15, the statistical proportion is 100%, and the second preset proportion threshold is 90%, then the total number of times of occurrence of the candidate derivative text in the communities may be increased.
In case 2, if the counted proportion is less than or equal to the second preset proportion threshold, no processing may be performed at this time, that is, the total number of times for counting the occurrence of the candidate derivative text in the plurality of communities is not increased.
For example, in the foregoing example, if the proportion of the candidate derivative text "thank you Li Guoqing" is set to 25% in the 4 associated objects c12-c15, and the second preset number threshold is set to 90%, then there is no need to increase the total number of times the candidate derivative text appears in the communities.
By implementing the alternative embodiment, the total times can be simply and accurately obtained by means of proportion statistics of the associated objects of the candidate derivative text used in a preset time period, so that powerful support is provided for calculation of the second heat of the candidate derivative text.
In an alternative embodiment, the adding of the total number of times of occurrence of the candidate derivative text in the multiple communities may be adding 1 to the number of times of occurrence of the candidate derivative text in each community (i.e. 1 unit amount), and in practical application, the added unit amount may be flexibly set according to a specific application scenario.
S602, taking the counted total times as the second heat of the candidate derivative text in the virtual resource transfer relation network, or taking the ratio of the counted total times to the preset time period as the second heat of the candidate derivative text in the virtual resource transfer relation network.
After the total times are obtained in the embodiment of the application, the total times can be directly used as the second heat of the candidate derivative text in the virtual resource transfer relation network, or the ratio of the total times to the preset time period can be used as the second heat of the candidate derivative text in the virtual resource transfer relation network.
It should be noted that, the detailed description of S201 to S204 shown in fig. 6 is please refer to S201 to S204 shown in fig. 2, and the detailed description is omitted here.
In the embodiment of the application, the total times are obtained through calculation, the ratio of the total times or the total times to the preset time period is used as the second heat of the candidate derived text in the virtual resource transfer relation network, and the whole flow of the second heat of the candidate derived text in the virtual resource transfer relation network is simple and is close to the actual use condition.
In one embodiment of the present application, another data processing method is provided, which may be performed by the server 102. As shown in fig. 7, the data processing method may include S701 to S703, S201 to S203.
And S701, detecting the magnitude relation between the second heat degree and a second preset heat degree threshold value of the candidate derivative text in the virtual resource transfer relation network.
In the embodiment of the application, the server calculates the second heat of the candidate derived text in the virtual resource transfer relation network, and then can detect the magnitude relation between the second heat of the candidate derived text in the virtual resource transfer relation network and a second preset heat threshold.
In the embodiment of the application, the detecting the magnitude relation between the second heat and the second preset heat threshold in the virtual resource transfer relation network of the candidate derivative text includes the following two cases:
and 1, detecting that the second heat of the candidate derivative text in the virtual resource transfer relation network is greater than a second preset heat threshold.
And 2, detecting that the second heat of the candidate derivative text in the virtual resource transfer relation network is smaller than or equal to a second preset heat threshold.
And S702, if the second heat of the candidate derivative text in the virtual resource transfer relation network is greater than a second preset heat threshold, determining the candidate derivative text as the target derivative text.
In case 1, if the second heat of the candidate derived text in the virtual resource transfer relation network is detected to be greater than the second preset heat threshold, the candidate derived text is characterized to be used frequently in the virtual resource transfer relation network, so that the candidate derived text can be determined as the target derived text.
In case 2, if the second heat degree of the candidate derivative text in the virtual resource transfer relationship network is detected to be less than or equal to the second preset heat degree threshold, the candidate derivative text is characterized to be used frequently in the virtual resource transfer relationship network, so that the candidate derivative text can be not processed, i.e. the candidate derivative text is not required to be determined as the target derivative text.
S703, generating new word extraction logic based on the information except the candidate words in the target derivative text.
In the embodiment of the application, the server determines the candidate derivative text as the target derivative text, and then new word extraction logic can be generated based on information except the candidate words in the target derivative text.
For example, with the foregoing example taken in, candidate derivative text is "thank Li Guoqing", "Li Guoqing is partnership welfare", "thank Li Guoqing is partnership welfare", "Li Guoqing is excellent", "Li Guoqing is excellent", where only "thank Li Guoqing" is determined as target derivative text, then information in "thank Li Guoqing" other than "Li Guoqing" (i.e., candidate word or target word) is "thank", at which point new word extraction logic (e.g., "thank xxx") may be generated based on "thank".
In one embodiment of the present application, the process of adding the newly generated word extraction logic to the logic library in S204 may include:
displaying the generated word extraction logic in a logic auditing interface;
if the auditing operation for the generated word extraction logic auditing is received in the logic auditing interface, the generated word extraction logic is added into the logic library in response to the auditing operation.
That is, in an alternative embodiment, the server may display the newly generated word extraction logic in a logic audit interface such that the auditor (e.g., an associated staff member) may audit the newly generated word extraction logic based on the newly generated word extraction logic displayed in the logic audit interface, and accordingly, add the newly generated word extraction logic to the logic library in response to an audit operation passed by the audit.
For example, referring to FIG. 8-1, an exemplary logical audit interface is illustrated. 8-1, the auditor audits the new term extraction logic through the new term extraction logic displayed in the logic audit interface, wherein the auditor can trigger the "allow join" component to issue audit operations passed by the audit or can trigger the "reject join" component to issue audit operations failed by the audit, alternatively, the triggering of the component can be achieved by clicking, sliding, etc.
By implementing the alternative embodiment, the auditing party is used for auditing the newly generated word extraction logic, so that the accuracy of the newly generated word extraction logic is ensured, more accurate target words can be extracted based on the accurate newly generated word extraction logic, and the phenomenon of resource waste caused by invalid extraction of the newly generated word extraction logic is avoided to a certain extent.
In one embodiment of the present application, the newly generated word extraction logic may be multiple. That is, in an alternative embodiment, a plurality of candidate derivative texts are extracted simultaneously, and accordingly, a plurality of candidate derivative texts in the plurality of candidate derivative texts are determined as target derivative texts, respectively, and then, for each target derivative text, word extraction logic is generated based on information except for candidate words in the target derivative text, so as to generate a plurality of word extraction logic.
Accordingly, the process of displaying the newly generated word extraction logic in the logic audit interface may include:
sorting the plurality of newly generated word extraction logics according to the sequence from big to small of candidate derivative text heat corresponding to the newly generated word extraction logics to obtain a sorting sequence;
And selecting a preset number of word extraction logics from the ordered sequence, and displaying the selected word extraction logics in a logic auditing interface.
That is, in the case that the number of the word extraction logics to be newly generated is multiple in the alternative embodiment, the plurality of the word extraction logics to be newly generated may be ordered in the order from large to small according to the heat degree of the word extraction logics to be newly generated (that is, the second heat degree of the candidate derivative text corresponding to the word extraction logics to be newly generated), so as to obtain an ordered sequence, then the previous preset number of word extraction logics are selected from the ordered sequence, and the selected preset number of word extraction logics are displayed in the logic auditing interface for the auditor to audit.
For example, see table 7 for an exemplary ordered sequence.
TABLE 7
Wherein, heat 1> heat 2> heat 3> heat 4> heat 5 in table 7.
For example, referring to fig. 8-2, another exemplary logical audit interface is illustrated. 8-2, a plurality of word extraction logics are displayed in the logic auditing interface, wherein each word extraction logic corresponds to an "allowed joining" component and a "refused joining" component, and the auditing party can send auditing operations according to actual conditions.
By way of example, batch operation components, such as a batch allow join component and a batch reject join component, may also be displayed in the logical audit interface, and batch audit operations may be implemented by triggering the batch operation components, thereby saving audit time for auditors.
The logic audit interface can also display an audit remark component, and corresponding text input (text, voice and the like) can be realized by triggering the audit remark component, so that the audit process of the word extraction logic can be correspondingly recorded or marked, and later retrospective management and the like are facilitated.
By implementing the alternative embodiment, the newly generated word extraction logic is subjected to sorting and screening processing, so that auditing pressure is relieved for an auditing party, and auditing efficiency is higher.
It should be noted that, the detailed description of S201 to S204 shown in fig. 7 is please refer to S201 to S204 shown in fig. 2, and the detailed description is omitted here.
According to the method and the device for generating the word extraction logic, whether the candidate derived text is the target derived text or not can be simply, conveniently and accurately determined through comparison between the second heat of the candidate derived text in the virtual resource transfer relation network and the second preset heat threshold, and new word extraction logic is generated based on information except candidate words in the candidate derived text serving as the target derived text, so that update of the word extraction logic is achieved, and convenience and accuracy of update of the word extraction logic are improved.
One specific scenario of the embodiments of the present application is described in detail below:
referring to fig. 9, fig. 9 is a flowchart illustrating a data processing method according to an embodiment of the present application.
As shown in fig. 9, the data processing method at least includes S901 to S9012, and is described in detail as follows:
and S901, if the acquisition request is received, acquiring word extraction logic from a logic library updated before the current moment.
In the embodiment of the application, if the acquisition request sent by the acquisition demand party is received, word extraction logic can be obtained from the updated logic library before the current moment.
The obtaining word extraction logic from the updated logic library before the current moment in the embodiment of the application may include: acquiring an update record of a logic library, wherein the update record comprises a plurality of update times; then selecting the update time nearest to the current time from the plurality of update times; word extraction logic is then obtained from the logic library updated at the selected update time.
S902, extracting candidate words from remark data generated by the virtual resource transfer relation network based on the acquired word extraction logic.
The remark data in the embodiment of the present application is used to explain the transfer of virtual resources.
In an embodiment of the present application, extracting, based on the obtained word extraction logic, candidate words from remark data generated by the virtual resource transfer relationship network may include: for each community of the virtual resource transfer relation network, obtaining transfer data generated by virtual resource transfer based on an application program in the community, and obtaining data related to resource transfer remarks from the transfer data; combining the data related to the resource transfer remarks corresponding to the communities to obtain remark data; candidate words are then extracted from the remark data based on the retrieved word extraction logic.
S903, counting the total number of occurrence times of the candidate word in a plurality of communities of the virtual resource transfer relationship network in a preset time period.
Each community in this embodiment includes a center object, and an associated object associated with the center object; the counting of the total number of occurrences of the candidate word in the plurality of communities of the virtual resource transfer relationship network within the preset time period may include: selecting communities of candidate words used by the center object from a plurality of communities of the virtual resource transfer relationship network; then counting the number of associated objects using the candidate words in a preset time period for each selected community; if the counted number is greater than a first preset number threshold, the total number of times that the candidate word appears in the communities is counted is increased.
S904, taking the counted total times as the first heat of the candidate words in the virtual resource transfer relation network, or taking the ratio of the counted total times to the preset time period as the first heat of the candidate words in the virtual resource transfer relation network.
In the embodiment of the application, the counted total times can be used as the first heat of the candidate word in the virtual resource transfer relation network, or the ratio of the counted total times to the preset time period can be used as the first heat of the candidate word in the virtual resource transfer relation network.
S905, if the first heat of the candidate word in the virtual resource transfer relation network is greater than a first preset heat threshold, determining the candidate word as a target word.
In the embodiment of the application, if the counted total number of times is taken as the first heat of the candidate word in the virtual resource transfer relation network, the candidate word can be determined as the target word when the total number of times is greater than a first preset number of times threshold; if the ratio of the counted total times to the preset time period is used as the first heat of the candidate word in the virtual resource transfer relation network, the candidate word can be determined to be the target word when the ratio is larger than a first preset ratio threshold.
S906, adding the target word into the word stock to update the word stock.
In this embodiment of the present application, the target words are multiple, and adding the target words to the word stock may include: sequencing a plurality of target words according to the sequence of the heat degree of the target words from big to small to obtain a sequencing sequence; then selecting a preset number of target words from the ordering sequence, and displaying the selected target words in a target word auditing interface; if the auditing operation passing the auditing of the target word is received in the target word auditing interface, the target word is added into the word stock in response to the auditing operation.
S907, extracting candidate derivative texts which are derived by taking target words as roots from remark data.
S908, counting the total number of occurrence times of the candidate derivative text in a plurality of communities of the virtual resource transfer relationship network in a preset time period.
In this embodiment, counting the total number of occurrences of the candidate derivative text in the multiple communities of the virtual resource transfer relationship network within the preset time period may include: selecting communities of which the center object uses the candidate derivative texts from a plurality of communities of the virtual resource transfer relationship network; then counting the number of associated objects using the candidate derived text in a preset time period for each selected community; if the counted number is greater than a second preset number threshold, the total number of times of occurrence of the candidate derived text in a plurality of communities is increased.
And S909, taking the counted total times as the second heat of the candidate derivative text in the virtual resource transfer relation network, or taking the ratio of the counted total times to the preset time period as the second heat of the candidate derivative text in the virtual resource transfer relation network.
In the embodiment of the application, the counted total times can be used as the second heat of the candidate derivative text in the virtual resource transfer relation network, or the ratio of the counted total times to the preset time period can be used as the second heat of the candidate derivative text in the virtual resource transfer relation network.
And S9010, if the second heat degree of the candidate derivative text in the virtual resource transfer relation network is greater than a second preset heat degree threshold, determining the candidate derivative text as a target derivative text.
In the embodiment of the application, if the counted total times are taken as the second heat of the candidate derived text in the virtual resource transfer relation network, when the total times are greater than a second preset times threshold, the candidate derived text can be determined as the target derived text; if the ratio of the counted total times to the preset time period is used as the second heat of the candidate derived text in the virtual resource transfer relation network, the candidate derived text can be determined to be the target derived text when the ratio is larger than a second preset ratio threshold.
S9011 generates new word extraction logic based on information in the target derivative text other than the candidate word.
And S9012, adding the newly generated word extraction logic into the logic library to update the logic library again.
In this embodiment of the present application, the number of word extraction logics newly generated is multiple, and updating the logic library again based on the word extraction logics newly generated may include: sorting the plurality of newly generated word extraction logics according to the sequence of the heat degree of the candidate derivative texts corresponding to the newly generated word extraction logics from large to small to obtain a sorting sequence; then selecting a preset number of word extraction logics from the sorting sequence, and displaying the selected word extraction logics in a logic auditing interface; if the auditing operation for the generated word extraction logic auditing is received in the logic auditing interface, the generated word extraction logic is added into the logic library in response to the auditing operation.
For example, the word extraction logic acquired from the logic library updated before the current time is "thanksxxx", so that the candidate word "Li Guoqing" is extracted based on the acquired word extraction logic "thanksxxx", and the candidate word "Li Guoqing" is taken as a target word based on the heat of the candidate word in the virtual resource transfer relationship network; candidate derivative text "thank Li Guoqing" that can be the target derivative text can then be determined based on the popularity of the candidate derivative text with "Li Guoqing" as the root in the virtual resource transfer relationship network, and new word extraction logic such as "thank xxx" can be generated based on the "thank you" in the target derivative text "thank you Li Guoqing"; the logical library may then be updated again based on the newly generated word extraction logic "thank xxx".
Therefore, the collection of the target words and the update of word extraction logic are realized, and the method can be suitable for a scene with shorter text such as remarking data and a scene with higher privacy requirements through iterative update of a logic library, so that the target words are collected more comprehensively and accurately.
Referring to fig. 10, based on word extraction logic contained in a newly updated logic library, extracting candidate words from remark data generated by a virtual resource transfer relationship network, determining the candidate words as target words by calculating a first heat of the obtained candidate words in the virtual resource transfer relationship network, and adding the target words into a word library (also referred to as an entity library); and extracting candidate derivative texts which are derived by taking target words as roots from remark data, determining the candidate derivative texts as target derivative texts through the second heat of the calculated candidate derivative texts in the virtual resource transfer relation network, generating new word extraction logic based on the target derivative texts, adding the new word extraction logic into a logic library, and sequentially carrying out loop iteration until an iteration stop condition (such as reaching preset iteration times or receiving an iteration stop request) is reached.
It should be noted that, the detailed description of S901 to S9012 in fig. 9 is referred to the previous embodiments, and will not be repeated here.
According to the embodiment of the application, the logic library is updated through iteration, so that more comprehensive and more accurate target words can be acquired, and the method is suitable for a scene with short text, such as remarking data; meanwhile, a large number of samples are not required to be collected to train a model to collect target words, the collection of the target words is not required to be combined with a context, the collection is simpler and more convenient, related privacy data are not involved, and the method is also suitable for scenes with higher requirements on privacy.
FIG. 11 is a block diagram of a data processing apparatus according to one embodiment of the present application. As shown in fig. 11, the data processing apparatus includes:
the obtaining module 1101 is configured to obtain word extraction logic from a logic library updated before the current moment if the collection request is received;
a first extraction module 1102 configured to extract candidate words from remark data generated by the virtual resource transfer relationship network based on the obtained word extraction logic;
a second extracting module 1103 configured to extract, if the candidate word is determined to be a target word based on a first heat of the candidate word in the virtual resource transfer relationship network, a candidate derivative text derived by taking the target word as a root from the remark data;
The generating and updating module 1104 is configured to generate new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and add the newly generated word extraction logic to the logic library to update the logic library again.
In one embodiment of the present application, based on the foregoing, the virtual resource transfer relationship network includes a plurality of communities; the apparatus further comprises:
the first statistics module is configured to count the total number of times that the candidate word appears in the communities within a preset time period;
the first determining module is configured to take the counted total times as the first heat of the candidate word in the virtual resource transfer relation network, or take the ratio of the counted total times to the preset time period as the first heat of the candidate word in the virtual resource transfer relation network.
In one embodiment of the present application, based on the foregoing, each community includes a center object, and an associated object associated with the center object; the first statistical module is specifically configured to:
selecting a community in which the candidate word is used by a center object from the plurality of communities;
Counting the number of associated objects using the candidate words in a preset time period for each selected community;
if the counted number is greater than a first preset number threshold, the total number of times that the candidate word appears in the communities is counted.
In one embodiment of the present application, based on the foregoing solution, the apparatus further includes:
the first detection module is configured to detect the magnitude relation between the first heat degree and a first preset heat degree threshold value of the candidate word in the virtual resource transfer relation network;
the second determining module is configured to determine the candidate word as a target word if the first heat of the candidate word in the virtual resource transfer relation network is greater than the first preset heat threshold;
and the first storage module is configured to add the target word into a word stock so as to update the word stock.
In an embodiment of the present application, based on the foregoing solution, the first storage module is specifically configured to:
displaying the target word in a target word auditing interface;
if the auditing operation for the auditing of the target word is received in the auditing interface of the target word, the target word is added into a word stock in response to the auditing operation.
In one embodiment of the present application, based on the foregoing scheme, the target word is a plurality of target words; the first storage module is further specifically configured to:
sequencing a plurality of target words according to the sequence of the heat degree of the target words from big to small to obtain a sequencing sequence;
and selecting a preset number of target words from the ordered sequence, and displaying the selected target words in a target word auditing interface.
In one embodiment of the present application, based on the foregoing solution, the apparatus further includes:
the management module is configured to detect the security level of the party to be detected based on the target word contained in the word stock if the management request is received, so as to obtain a detection result; and performing management operation on the to-be-detected party based on the detection result.
In one embodiment of the present application, based on the foregoing, the virtual resource transfer relationship network includes a plurality of communities; the apparatus further comprises:
the second statistics module is configured to count the total number of times that the candidate derivative text appears in the communities within a preset time period;
and the third determining module is configured to take the counted total times as the second heat of the candidate derived text in the virtual resource transfer relation network, or take the ratio of the counted total times to the preset time period as the second heat of the candidate derived text in the virtual resource transfer relation network.
In one embodiment of the present application, based on the foregoing, each community includes a center object, and an associated object associated with the center object; the second statistical module is specifically configured to:
selecting a community in which the candidate derivative text is used by a center object from the plurality of communities;
detecting the number of associated objects using the candidate derived text in a preset time period for each selected community;
and if the counted number is larger than a second preset number threshold, increasing the total number of times for counting the candidate derived text appearing in the communities.
In one embodiment of the present application, based on the foregoing solution, the apparatus further includes:
the second detection module is configured to detect the magnitude relation between the second heat degree of the candidate derivative text in the virtual resource transfer relation network and a second preset heat degree threshold;
a fourth determining module, configured to determine the candidate derived text as a target derived text if a second heat of the candidate derived text in the virtual resource transfer relationship network is greater than the second preset heat threshold;
and a second storage module configured to generate new word extraction logic based on information in the target derivative text other than the candidate word.
In an embodiment of the present application, based on the foregoing solution, the second storage module is specifically configured to:
displaying the generated word extraction logic in a logic auditing interface;
and if the auditing operation for the generated word extraction logic auditing pass is received in the logic auditing interface, responding to the auditing operation and adding the generated word extraction logic into the logic library.
In one embodiment of the present application, based on the foregoing scheme, the newly generated word extraction logic is a plurality of; the second storage module is further specifically configured to:
sorting the plurality of newly generated word extraction logics according to the sequence of the heat degree of the candidate derivative texts corresponding to the newly generated word extraction logics from large to small to obtain a sorting sequence;
and selecting a preset number of word extraction logics from the ordered sequence, and displaying the selected word extraction logics in a logic auditing interface.
In one embodiment of the present application, based on the foregoing, the virtual resource transfer relationship network includes a plurality of communities; the first extraction module 1102 is specifically configured to:
for each community, obtaining transfer data generated by virtual resource transfer based on an application program in the community, and obtaining data related to resource transfer remarks from the transfer data;
Combining data related to the resource transfer remarks corresponding to the communities to obtain remark data;
candidate words are extracted from the remark data based on the acquired word extraction logic.
In one embodiment of the present application, based on the foregoing solution, the obtaining module 1101 is specifically configured to:
acquiring an update record of a logic library, wherein the update record comprises a plurality of update times;
selecting an update time nearest to the current time from the plurality of update times;
word extraction logic is obtained from a logic library updated at the selected update time.
It should be noted that the apparatus provided in the foregoing embodiment and the method provided in the foregoing embodiment belong to the same concept, and the specific manner in which the respective modules and units perform the operations have been described in detail in the method embodiment.
The embodiment of the application also provides electronic equipment, which comprises: one or more processors; and a memory for storing one or more programs that, when executed by the one or more processors, cause the electronic device to implement the data processing method as before.
Fig. 12 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present application.
It should be noted that, the computer system 1200 of the electronic device shown in fig. 12 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 12, the computer system 1200 includes a central processing unit (Central Processing Unit, CPU) 1201 which can perform various appropriate actions and processes, such as performing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a random access Memory (Random Access Memory, RAM) 1203. In the RAM 1203, various programs and data required for the system operation are also stored. The CPU 1201, ROM 1202, and RAM 1203 are connected to each other through a bus 1204. An Input/Output (I/O) interface 1205 is also connected to bus 1204.
The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker, etc.; a storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1210 so that a computer program read out therefrom is installed into the storage section 1208 as needed.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. When executed by a Central Processing Unit (CPU) 1201, performs the various functions defined in the system of the present application.
It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable medium can be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
Another aspect of the present application also provides a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as before. The computer-readable medium may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device.
Another aspect of the present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable medium. The processor of the computer device reads the computer instructions from the computer-readable medium, and the processor executes the computer instructions so that the computer device performs the data processing method provided in the above-described respective embodiments.
The foregoing is merely a preferred exemplary embodiment of the present application and is not intended to limit the embodiments of the present application, and those skilled in the art may make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (16)

1. A method of data processing, comprising:
if the acquisition request is received, acquiring word extraction logic from a logic library updated before the current moment;
extracting candidate words from remark data generated by the virtual resource transfer relation network based on the obtained word extraction logic;
if the candidate word is determined to be a target word based on the first heat of the candidate word in the virtual resource transfer relation network, extracting a candidate derivative text which is derived by taking the target word as a root from the remark data;
generating new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and adding the newly generated word extraction logic into the logic library to update the logic library again.
2. The method of claim 1, wherein the virtual resource transfer relationship network comprises a plurality of communities; the method further comprises the steps of:
counting the total number of times of occurrence of the candidate words in the communities within a preset time period;
and taking the counted total times as the first heat of the candidate word in the virtual resource transfer relation network, or taking the ratio of the counted total times to the preset time period as the first heat of the candidate word in the virtual resource transfer relation network.
3. The method of claim 2, wherein each community includes a central object, and an associated object associated with the central object; the counting of the total number of occurrence times of the candidate words in the communities within the preset time period comprises the following steps:
selecting a community in which the candidate word is used by a center object from the plurality of communities;
counting the number of associated objects using the candidate words in a preset time period for each selected community;
if the counted number is greater than a first preset number threshold, the total number of times that the candidate word appears in the communities is counted.
4. The method according to claim 1, wherein the method further comprises:
detecting the magnitude relation between the first heat and a first preset heat threshold value of the candidate word in the virtual resource transfer relation network;
if the first heat of the candidate word in the virtual resource transfer relation network is greater than the first preset heat threshold, determining the candidate word as a target word;
and adding the target word into a word stock to update the word stock.
5. The method of claim 4, wherein the adding the target word to a word stock comprises:
Displaying the target word in a target word auditing interface;
if the auditing operation for the auditing of the target word is received in the auditing interface of the target word, the target word is added into a word stock in response to the auditing operation.
6. The method of claim 5, wherein the target word is a plurality of; the displaying the target word in the target word auditing interface comprises the following steps:
sequencing a plurality of target words according to the sequence of the heat degree of the target words from big to small to obtain a sequencing sequence;
and selecting a preset number of target words from the ordered sequence, and displaying the selected target words in a target word auditing interface.
7. The method of claim 4, wherein after the adding the target word to a word stock, the method further comprises:
if a management request is received, carrying out security level detection on a party to be detected based on a target word contained in the word stock to obtain a detection result;
and performing management operation on the to-be-detected party based on the detection result.
8. The method of claim 1, wherein the virtual resource transfer relationship network comprises a plurality of communities; the method further comprises the steps of:
Counting the total number of times of occurrence of the candidate derivative text in the communities within a preset time period;
and taking the counted total times as the second heat of the candidate derived text in the virtual resource transfer relation network, or taking the ratio of the counted total times to the preset time period as the second heat of the candidate derived text in the virtual resource transfer relation network.
9. The method of claim 1, wherein each community includes a central object and an associated object associated with the central object; the counting of the total number of times of occurrence of the candidate derivative text in the communities within the preset time period comprises the following steps:
selecting a community in which the candidate derivative text is used by a center object from the plurality of communities;
detecting the number of associated objects using the candidate derived text in a preset time period for each selected community;
and if the counted number is larger than a second preset number threshold, increasing the total number of times for counting the candidate derived text appearing in the communities.
10. The method of claim 1, wherein the generating new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relationship network comprises:
Detecting a magnitude relation between a second heat degree and a second preset heat degree threshold value of the candidate derivative text in the virtual resource transfer relation network;
if the second heat degree of the candidate derivative text in the virtual resource transfer relation network is larger than the second preset heat degree threshold value, determining the candidate derivative text as a target derivative text;
generating new word extraction logic based on information in the target derivative text other than the candidate word.
11. The method of any one of claims 1 to 10, wherein the virtual resource transfer relationship network comprises a plurality of communities; the extracting candidate words from remark data generated by the virtual resource transfer relation network based on the obtained word extraction logic comprises the following steps:
for each community, obtaining transfer data generated by virtual resource transfer based on an application program in the community, and obtaining data related to resource transfer remarks from the transfer data;
combining data related to the resource transfer remarks corresponding to the communities to obtain remark data;
candidate words are extracted from the remark data based on the acquired word extraction logic.
12. The method according to any one of claims 1 to 10, wherein the retrieving word extraction logic from a logic library updated prior to the current time comprises:
acquiring an update record of a logic library, wherein the update record comprises a plurality of update times;
selecting an update time nearest to the current time from the plurality of update times;
word extraction logic is obtained from a logic library updated at the selected update time.
13. A data processing apparatus, comprising:
the acquisition module is configured to acquire word extraction logic from a logic library updated before the current moment if an acquisition request is received;
a first extraction module configured to extract candidate words from remark data generated by the virtual resource transfer relationship network based on the obtained word extraction logic;
the second extraction module is configured to extract candidate derivative text derived by taking the target word as a root from the remark data if the candidate word is determined to be the target word based on the first heat of the candidate word in the virtual resource transfer relation network;
and the generation and updating module is configured to generate new word extraction logic based on the second heat of the candidate derivative text in the virtual resource transfer relation network, and add the newly generated word extraction logic into the logic library so as to update the logic library again.
14. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs that, when executed by the electronic device, cause the electronic device to implement the data processing method of any of claims 1-12.
15. A computer readable medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data processing method according to any one of claims 1 to 12.
16. A computer program product comprising computer instructions which, when executed by a processor, implement a data processing method as claimed in any one of claims 1 to 12.
CN202311308352.9A 2023-10-10 2023-10-10 Data processing method and device, equipment and medium Pending CN117370538A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311308352.9A CN117370538A (en) 2023-10-10 2023-10-10 Data processing method and device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311308352.9A CN117370538A (en) 2023-10-10 2023-10-10 Data processing method and device, equipment and medium

Publications (1)

Publication Number Publication Date
CN117370538A true CN117370538A (en) 2024-01-09

Family

ID=89392148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311308352.9A Pending CN117370538A (en) 2023-10-10 2023-10-10 Data processing method and device, equipment and medium

Country Status (1)

Country Link
CN (1) CN117370538A (en)

Similar Documents

Publication Publication Date Title
CN107391359B (en) Service testing method and device
CN111125118B (en) Associated data query method, device, equipment and medium
CN112084179B (en) Data processing method, device, equipment and storage medium
US8396877B2 (en) Method and apparatus for generating a fused view of one or more people
CN111586695A (en) Short message identification method and related equipment
CN111400448A (en) Method and device for analyzing incidence relation of objects
CN110895587B (en) Method and device for determining target user
CN113849702A (en) Method and device for determining target data, electronic equipment and storage medium
CN115204889A (en) Text processing method and device, computer equipment and storage medium
CN117131281B (en) Public opinion event processing method, apparatus, electronic device and computer readable medium
CN114327493A (en) Data processing method and device, electronic equipment and computer readable medium
CN111932427B (en) Method and system for detecting emergent public security incident based on multi-mode data
CN112965943A (en) Data processing method and device, electronic equipment and storage medium
CN107679097A (en) A kind of distributed data processing method, system and storage medium
CN109697224B (en) Bill message processing method, device and storage medium
CN117370538A (en) Data processing method and device, equipment and medium
CN111414364B (en) User information generation method and device and electronic equipment
CN113609271A (en) Service processing method, device and equipment based on knowledge graph and storage medium
CN112163127A (en) Relationship graph construction method and device, electronic equipment and storage medium
CN112685388B (en) Data model table construction method and device, electronic equipment and computer readable medium
CN111737590B (en) Social relation mining method and device, electronic equipment and storage medium
CN111832998B (en) Method and device for judging true user sending telephone number
CN117112681A (en) Method, device, system and storage medium for establishing moral evaluation database
CN117193978A (en) Task scheduling method, device, equipment and storage medium
CN115270029A (en) Real-time data display method and device based on mail, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication