CN111079010B

CN111079010B - Data processing method, device and system

Info

Publication number: CN111079010B
Application number: CN201911274881.5A
Authority: CN
Inventors: 冯泽亮; 祝捷; 王雯雯; 李薇; 赵坤; 杨龙; 张晓丽; 王滋怡; 秦刚
Original assignee: State Grid Corp of China SGCC; State Grid Sichuan Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Sichuan Electric Power Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2023-03-31
Anticipated expiration: 2039-12-12
Also published as: CN111079010A

Abstract

The invention discloses a data processing method, a device and a system, wherein the method comprises the following steps: sending text theme data to a plurality of clients, and receiving text reply data corresponding to the text theme data fed back by each client; determining a plurality of target keywords corresponding to the text reply data and a one-dimensional weight value corresponding to each target keyword; acquiring a plurality of preset reference text data, and determining a reference keyword corresponding to each preset reference text data and a one-dimensional weight value corresponding to each reference keyword; and determining target preset reference text data with the highest correlation degree with the text reply data according to the target keywords and the one-dimensional weight values corresponding to the target keywords as well as the reference keywords and the one-dimensional weight values corresponding to the reference keywords.

Description

Data processing method, device and system

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, and system.

Background

In the related art, when content recommendation is performed for a user, there is a drawback in that there is a balance between recommendation accuracy and calculation speed. The drawbacks are more pronounced in particular technical and application areas, due to the obvious characteristics of the areas and the relative scarcity of allocable computing resources.

Disclosure of Invention

In view of the foregoing problems, the present invention provides a data processing method, apparatus and corresponding system.

According to a first aspect of the embodiments of the present invention, there is provided a data processing method for a server, including:

sending text topic data to a plurality of clients, and receiving text reply data corresponding to the text topic data fed back by each client;

determining a plurality of target keywords corresponding to the text reply data and a one-dimensional weight value corresponding to each target keyword;

acquiring a plurality of preset reference text data, and determining a reference keyword corresponding to each preset reference text data and a one-dimensional weight value corresponding to each reference keyword;

and determining target preset reference text data with the highest correlation degree with the text reply data according to the target keywords and the one-dimensional weight values corresponding to the target keywords as well as the reference keywords and the one-dimensional weight values corresponding to the reference keywords.

In one embodiment, preferably, determining a plurality of target keywords corresponding to the text reply data and a one-dimensional weight value corresponding to each target keyword includes:

performing word segmentation processing on each text reply data to obtain a plurality of target keywords;

and determining a one-dimensional weight value corresponding to each target keyword according to the occurrence frequency of each target keyword in the text reply data.

In one embodiment, preferably, after obtaining the plurality of target keywords, the method further includes:

acquiring a keyword storage word bank, wherein a plurality of preset keywords and coupling degrees among different preset keywords are stored in the keyword storage word bank;

determining whether a first target keyword and a second target keyword which can be combined exist in the plurality of target keywords according to the coupling degree between the different preset keywords;

when a first target keyword and a second target keyword which can be combined exist, combining the first target keyword and the second target keyword.

In one embodiment, preferably, determining whether there are a first target keyword and a second target keyword that can be merged in the plurality of target keywords according to the coupling degree between the different preset keywords includes:

acquiring a target preset keyword pair with the coupling degree within a preset range from the keyword storage word library;

judging whether the target preset keyword pair exists in the target keywords or not;

and when the target preset keyword pairs exist in the target keywords, determining that a first keyword and a second keyword which can be combined exist in the target keywords.

In one embodiment, preferably, the method further comprises:

displaying preset keywords in the keyword storage word library in a preset display mode;

receiving a merging processing operation which is input by a user and is used for merging a first preset keyword and a second preset keyword, merging and displaying the first preset keyword and the second preset keyword according to the merging processing operation, and adding 1 to the coupling degree of the first preset keyword and the second preset keyword in a keyword storage word library; or alternatively

Receiving a separation processing operation which is input by a user and separates a first preset keyword and a second preset keyword which are combined and displayed, separating and displaying the first preset keyword and the second preset keyword according to the separation processing operation, and reducing the coupling degree of the first preset keyword and the second preset keyword in the keyword storage word library by 1.

In one embodiment, preferably, the method further comprises:

when the coupling degree of the first preset keyword and the second preset keyword in the keyword storage word library is greater than a first preset threshold value, the first preset keyword and the second preset keyword are combined and displayed;

and when the coupling degree of the first preset keyword and the second preset keyword in the keyword storage word library is smaller than a second preset threshold value, the first preset keyword and the second preset keyword are displayed in a separated mode.

In one embodiment, preferably, determining target preset reference text data with the highest correlation degree with the text reply data according to the target keyword and the one-dimensional weight value corresponding to the target keyword, and the reference keyword and the one-dimensional weight value corresponding to the reference keyword comprises:

calculating the correlation degree between the text reply data and the preset reference text data according to the target keyword and the one-dimensional weight value corresponding to the target keyword as well as the reference keyword and the one-dimensional weight value corresponding to the reference keyword;

and determining the preset reference text data with the highest correlation degree as the target preset reference text data.

In one embodiment, preferably, the acquiring a plurality of preset reference text data includes:

storing preset multimedia data with different formats;

and converting the preset multimedia data with different formats into text data, and taking the text data as the preset reference text data.

According to a second aspect of an embodiment of the present invention, there is provided a data processing apparatus for a server, including:

a memory and a processor;

the memory is used for storing data used when the processor executes a computer program;

the processor is configured to execute a computer program to implement the method as described in the first aspect or any embodiment of the first aspect.

According to a third aspect of embodiments of the present invention, there is provided a data processing system including:

a server;

a plurality of clients coupled with the server;

the server sends text topic data to a plurality of clients, receives text reply data corresponding to the text topic data fed back by each client, and determines a plurality of target keywords corresponding to the text reply data and a one-dimensional weight value corresponding to each target keyword; acquiring a plurality of preset reference text data, and determining a reference keyword corresponding to each preset reference text data and a one-dimensional weight value corresponding to each reference keyword; and determining target preset reference text data with the highest correlation degree with the text reply data according to the target keywords and the one-dimensional weight values corresponding to the target keywords as well as the reference keywords and the one-dimensional weight values corresponding to the reference keywords.

In the embodiment of the invention, the server sends the text topic data to a plurality of clients, the clients return the corresponding text reply data to the server after processing, the server determines the keywords and the corresponding weights of the text reply data, and then compares the keywords and the weights with the keywords and the weights of the preset reference text data to further determine the target preset reference text data with the highest relevance with the text reply data, so that the text topic data and the like are sent to the clients through the server, and the reference text data with the highest relevance with the feedback result of the user is selected from a plurality of candidate reference text data according to the feedback results of most users, thereby improving the accuracy of content recommendation on the basis of less consumption of computing resources.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 shows a flow diagram of a data processing method according to one embodiment of the invention.

Fig. 2A shows a flow diagram of a data processing method according to another embodiment of the invention.

FIG. 2B shows a flow diagram of a data processing method according to yet another embodiment of the invention.

Fig. 3 shows a flowchart of step S202 in a data processing method according to another embodiment of the present invention.

Fig. 4 shows a flow diagram of a data processing method according to a further embodiment of the invention.

Fig. 5 shows a flow diagram of a data processing method according to yet another embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor do they limit the types of "first" and "second".

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a flow diagram of a data processing method according to an embodiment of the invention.

As shown in fig. 1, a data processing method according to an embodiment of the present invention is for a server, the data processing method including steps S101-S104:

step S101, sending text theme data to a plurality of clients, and receiving text reply data corresponding to the text theme data fed back by each client. Wherein, those skilled in the art can understand that the text topic data can be any text type data which can be processed or analyzed by using a text processing program, such as data in txt, bat, cvs, xml, and the like formats, and those skilled in the art can also understand that the text type data can be widely applied to various scenes of the internet, including but not limited to social networks, topic forums, comment areas of APP application stores, electronic questionnaires, and the like. Any specific text type of data and specific form of application scenario described above will fall within the scope of the present invention. Meanwhile, unless otherwise specified, "text", "text data", and/or "text subject data" in the present invention are in accordance with the explanations of the above meanings. After the server sends the data to the plurality of clients, the clients can present the data to the user, the user performs feedback and the like to obtain text reply data, and the text reply data is returned to the server.

Step S102, a plurality of target keywords corresponding to the text reply data and a one-dimensional weight value corresponding to each target keyword are determined.

Step S103, a plurality of preset reference text data are obtained, and a reference keyword corresponding to each preset reference text data and a one-dimensional weight value corresponding to each reference keyword are determined.

In one embodiment, preferably, the obtaining a plurality of preset reference text data includes:

storing preset multimedia data with different formats; the multimedia data may be text data, video data, audio data, etc.

And converting preset multimedia data with different formats into text data, and taking the text data as preset reference text data. The conversion of multimedia data to text data can be accomplished by any means known in the art. For example, audio data is converted into text data by an audio-text converter including companies such as the company of the science university flyer, and audio or subtitles are extracted from video data and converted into text data.

And step S104, determining target preset reference text data with highest relevance with the text reply data according to the target keywords and the one-dimensional weight values corresponding to the target keywords, the reference keywords and the one-dimensional weight values corresponding to the reference keywords.

In one embodiment, preferably, determining target preset reference text data with the highest relevance to the text reply data according to the target keyword and the one-dimensional weight value corresponding to the target keyword, and the reference keyword and the one-dimensional weight value corresponding to the reference keyword, includes:

calculating the correlation between the text reply data and preset reference text data according to the target keyword and the one-dimensional weight value corresponding to the target keyword as well as the reference keyword and the one-dimensional weight value corresponding to the reference keyword; and determining the preset reference text data with the highest correlation degree as target preset reference text data.

The correlation between the text reply data and the preset reference text data can be calculated by calculating the cosine distance between the target keyword and the reference keyword, and of course, other correlation calculation methods known in the related art can also be adopted for calculation.

In the embodiment, the server sends the text topic data to the plurality of clients, the clients return the corresponding text reply data to the server after processing, the server determines keywords and corresponding weights of the text reply data, and then compares the keywords and the weights with those of the preset reference text data to determine the target preset reference text data with the highest relevance to the text reply data.

Fig. 2 shows a flow diagram of a data processing method according to another embodiment of the invention.

As shown in fig. 2, in one embodiment, preferably, after obtaining the plurality of target keywords, the method further includes steps S201-S203:

step S201, a keyword storage lexicon is obtained, wherein the keyword storage lexicon stores a plurality of preset keywords and coupling degrees between different preset keywords.

Step S202, determining whether a first target keyword and a second target keyword which can be combined exist in the plurality of target keywords according to the coupling degree between different preset keywords.

Step S203, when the first target keyword and the second target keyword which can be combined exist, combining the first target keyword and the second target keyword.

In this embodiment, a keyword storage lexicon may be set, and some replaceable keywords, such as words with the same meaning, similar meaning, or opposite meaning, may be stored in the lexicon, so that when the target keyword and the weight are calculated, the words with the same meaning, similar meaning, or opposite meaning may be used as one keyword for calculation, thereby improving accuracy and efficiency.

As shown in fig. 3, in one embodiment, the step S202 preferably includes steps S301 to S303:

step S301, acquiring a target preset keyword pair with the coupling degree within a preset range from a keyword storage word library; the preset keyword pair is two preset keywords.

Step S302, judging whether a target preset keyword pair exists in a plurality of target keywords;

step S303, when a target preset keyword pair exists in the target keywords, determining that a first keyword and a second keyword which can be combined exist in the target keywords.

For example, the preset keyword pair is the keywords "clean" and "clean", and in the target keyword, if there are two keywords, the two keywords can be merged into one target keyword, and then the weights of the two keywords are counted together. According to the present invention, steps S301-S303 may be performed by a computer program loop until there are no merged first and second keywords.

As shown in fig. 4, in one embodiment, preferably, the method further includes steps S401 to S403:

step S401, displaying preset keywords in the keyword storage word library in a preset display mode. The keywords in the keyword storage lexicon can be stored in a graph form, each keyword corresponds to one node in the graph, and the coupling degree between any two keywords is stored on the edge between any two nodes.

Step S402, receiving a merging processing operation which is input by a user and merges a first preset keyword and a second preset keyword, merging and displaying the first preset keyword and the second preset keyword according to the merging processing operation, and adding 1 to the coupling degree of the first preset keyword and the second preset keyword in a keyword storage lexicon; or

Step S403, receiving a separation processing operation input by the user to separate the merged and displayed first preset keyword from the second preset keyword, separately displaying the first preset keyword and the second preset keyword according to the separation processing operation, and subtracting 1 from the coupling degree of the first preset keyword and the second preset keyword in the keyword storage lexicon.

The preset keywords can be displayed in a histogram form, one preset keyword corresponds to one histogram, a user can check the histogram of the preset keywords and can edit the histogram, for example, when the user judges that words appearing in two columnar bars in a view are replaceable words, the two words can be combined and presented in the view in a dragging mode, meanwhile, a keyword storage word bank is updated, and the coupling degree value of the two words is +1; when the user judges that two words presented in the same columnar bar in the view are not different from similar words, the two words can be separately presented in the view in a dragging mode, meanwhile, a keyword storage word bank is updated, and the value of the coupling degree of the two words is-1. In a further embodiment, the height of the histogram represents the number of the merged preset keywords, and the text of the preset keywords is displayed above the histogram.

As shown in fig. 2B, in another preferred embodiment, after the step S203, the method further includes:

and step S204, displaying the target keywords in a preset display mode, and processing the target keywords according to the input of the user.

The target keywords may be displayed in the form of a histogram, one target keyword or one combined target keyword corresponds to one histogram, and the height of the histogram is the weight of the target keyword, and is optionally displayed above or on the histogram. If the histogram corresponds to the independent target keyword, the target keyword is displayed above the histogram; and if the histogram corresponds to the merged target keyword, displaying the first target keyword and the second target keyword before merging above the histogram. The user can view the histogram of the target keyword or edit it. If the user judges that the words appearing in the two columnar bars in the view are replaceable words, the two words can be merged and presented in the view in a dragging mode, the height of the merged histogram is the sum of the weights of the two words, the keyword storage word bank is updated at the same time, and the coupling degree value of the two words is +1; when the user judges that two words presented in the same columnar bar in the view are not replaceable words, the two words can be separately presented in the view in a dragging mode, the heights of the two separated columnar diagrams are the weights of the two words respectively, meanwhile, the keyword storage word bank is updated, and the coupling degree value of the two words is-1. Through the step S204, the target keyword can be more accurate, and the calculation of the relevancy can be more accurate.

As shown in fig. 5, in one embodiment, preferably, the method further includes steps S501 to S502:

step S501, when the coupling degree of a first preset keyword and a second preset keyword in a keyword storage lexicon is larger than a first preset threshold value, the first preset keyword and the second preset keyword are merged and displayed;

step S502, when the coupling degree of the first preset keyword and the second preset keyword in the keyword storage lexicon is smaller than a second preset threshold value, the first preset keyword and the second preset keyword are displayed in a separated mode.

In this embodiment, the preset keywords may be automatically combined and displayed or separately displayed according to the value of the degree of coupling between the preset keywords, so that the user can view and edit the preset keywords conveniently.

According to a second aspect of the embodiments of the present invention, there is provided a data processing apparatus for a server, including:

a memory and a processor;

The processor is configured to:

In one embodiment, preferably, the method further comprises:

receiving a merging processing operation which is input by a user and is used for merging a first preset keyword and a second preset keyword, merging and displaying the first preset keyword and the second preset keyword according to the merging processing operation, and adding 1 to the coupling degree of the first preset keyword and the second preset keyword in a keyword storage word library; or

Receiving a separation processing operation which is input by a user and separates a first preset keyword and a second preset keyword which are combined and displayed, according to the separation processing operation, separating and displaying the first preset keyword and the second preset keyword, and reducing the coupling degree of the first preset keyword and the second preset keyword in the keyword storage word library by 1.

In one embodiment, preferably, the method further comprises:

when the coupling degree of the first preset keyword and the second preset keyword in the keyword storage word library is greater than a first preset threshold value, combining and displaying the first preset keyword and the second preset keyword;

storing preset multimedia data with different formats;

a server;

a plurality of clients coupled with the server;

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A data processing method for a server, comprising:

determining target preset reference text data with the highest correlation degree with the text reply data according to the target keywords and the one-dimensional weight values corresponding to the target keywords as well as the reference keywords and the one-dimensional weight values corresponding to the reference keywords;

specifically, word segmentation processing is carried out on each text reply data to obtain a plurality of target keywords;

2. The data processing method of claim 1, wherein after obtaining the plurality of target keywords, the method further comprises:

3. The data processing method of claim 2, wherein determining whether there are a first target keyword and a second target keyword that can be merged in the plurality of target keywords according to the coupling degree between the different preset keywords comprises:

4. The data processing method of claim 2, wherein the method further comprises:

5. The data processing method of claim 4, wherein the method further comprises:

6. The data processing method of claim 1, wherein determining the target preset reference text data with the highest relevance to the text reply data according to the target keyword and the one-dimensional weight value corresponding thereto, and the reference keyword and the one-dimensional weight value corresponding thereto, comprises:

7. The data processing method according to claim 1, wherein the obtaining a plurality of predetermined reference text data comprises:

storing preset multimedia data with different formats;

8. A data processing apparatus for a server, comprising:

a memory and a processor;

the processor is configured to execute a computer program to implement the method of any one of claims 1 to 7.

9. A data processing system, comprising:

a server;

a plurality of clients coupled with the server;

the server sends text topic data to a plurality of clients, receives text reply data corresponding to the text topic data fed back by each client, and determines a plurality of target keywords corresponding to the text reply data and a one-dimensional weight value corresponding to each target keyword; acquiring a plurality of preset reference text data, and determining a reference keyword corresponding to each preset reference text data and a one-dimensional weight value corresponding to each reference keyword; determining target preset reference text data with the highest correlation degree with the text reply data according to the target keywords and the one-dimensional weight values corresponding to the target keywords as well as the reference keywords and the one-dimensional weight values corresponding to the reference keywords, and specifically, performing word segmentation processing on each text reply data to obtain a plurality of target keywords; and determining a one-dimensional weight value corresponding to each target keyword according to the occurrence frequency of each target keyword in the text reply data.