CN110597989A - Data processing method and device and computer storage medium - Google Patents

Data processing method and device and computer storage medium Download PDF

Info

Publication number
CN110597989A
CN110597989A CN201910819017.2A CN201910819017A CN110597989A CN 110597989 A CN110597989 A CN 110597989A CN 201910819017 A CN201910819017 A CN 201910819017A CN 110597989 A CN110597989 A CN 110597989A
Authority
CN
China
Prior art keywords
data
knowledge
text
user
terminal application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910819017.2A
Other languages
Chinese (zh)
Inventor
张振伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910819017.2A priority Critical patent/CN110597989A/en
Publication of CN110597989A publication Critical patent/CN110597989A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application relates to the technical field of data processing, and discloses a data processing method, a data processing device and a computer storage medium, which are used for summarizing knowledge points read by a user and facilitating searching and summarizing after the user reads the knowledge points. The method comprises the following steps: receiving text data sent by a terminal application, wherein the text data is text data displayed in the terminal application; segmenting the text data to obtain segmented data; comparing the fragment data with pre-stored data in a knowledge base, determining knowledge data from all the fragment data, and determining a classification label of each knowledge data; and sending the knowledge data and the classification labels of the knowledge data to the terminal application.

Description

Data processing method and device and computer storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, and a computer storage medium.
Background
With the development of network technology, people are accustomed to browsing information on the internet. The information on the internet has the characteristics of large data volume, high updating speed, strong timeliness and the like, and a large amount of network information is generated every day. In order to facilitate the user to obtain information, the application pushes information flow products to the user through the client or the page of the website. Meanwhile, the application records the reading content, the reading duration, the reading time and other behaviors of the user and is used for analyzing the reading preference of the user. And then recommending the content which the user possibly likes to the application to the user by matching the user's preference. At present, a user generally forgets to apply the pushed information flow product after reading, and cannot form a knowledge system, so that the user feels that time is wasted when browsing the information flow product, and the trust sense and the value sense of the information flow product are lacked.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device and a computer storage medium, which are used for summarizing knowledge points read by a user and facilitating searching and induction after the user reads the knowledge points.
According to a first aspect of embodiments of the present application, there is provided a data processing method, including:
receiving text data sent by a terminal application, wherein the text data is text data displayed in the terminal application;
segmenting the text data to obtain segmented data;
comparing the fragment data with pre-stored data in a knowledge base, determining knowledge data from all the fragment data, and determining a classification label of each knowledge data;
and sending the knowledge data and the classification labels of the knowledge data to the terminal application.
According to a second aspect of embodiments of the present application, there is provided a data processing method, including:
the terminal application responds to a data processing request of a user and records the displayed text data;
the terminal application determines knowledge data and a classification label of the knowledge data, wherein the knowledge data and the classification label of the knowledge data are determined by comparing the text data to obtain fragment data and comparing the fragment data with prestored data in a knowledge base;
and the terminal application displays the knowledge data under the classification label of the knowledge data.
In an alternative embodiment, the displaying the knowledge data under the classification label of the knowledge data includes:
and responding to a knowledge display request of a user, displaying the knowledge data under the classification label of the knowledge data, and simultaneously displaying an access link of the text data corresponding to the knowledge data.
According to a third aspect of embodiments of the present application, there is provided a data processing apparatus, the apparatus including:
the terminal comprises a receiving and sending unit, a processing unit and a processing unit, wherein the receiving and sending unit is used for receiving text data sent by a terminal application, and the text data is text data displayed in the terminal application;
the fragmentation unit is used for fragmenting the text data to obtain fragmentation data;
the comparison unit is used for comparing the fragment data with prestored data in a knowledge base, determining knowledge data from all the fragment data and determining a classification label of each knowledge data;
the transceiving unit is further configured to send the knowledge data and the classification label of the knowledge data to the terminal application.
In an optional embodiment, the transceiver unit is specifically configured to receive N pieces of text data sent by the terminal application and user behavior data of each piece of text data, where N is greater than or equal to 1;
the fragmentation unit is further used for determining effective data from the N text data according to the user behavior data; and for any effective data, carrying out fragmentation on the effective data to obtain fragmented data.
In an optional embodiment, the pre-stored data in the knowledge base is stored in a classified manner according to set rules; the comparison unit is specifically configured to:
for any piece of fragment data, performing similarity matching on the piece of fragment data and all pre-stored data in a knowledge base, determining a highest similarity value, and determining the classification of the pre-stored data with the highest similarity value with the piece of fragment data;
and taking the fragment data with the highest similarity value larger than the similarity threshold value as knowledge data, and taking the classification label of the prestored data with the highest similarity value with the knowledge data as the classification label of the knowledge data.
In an optional embodiment, the transceiver unit is specifically configured to:
receiving a knowledge data acquisition request sent by the terminal application;
and sending a knowledge data acquisition response to the terminal application, wherein the knowledge data acquisition response comprises the knowledge data and the classification labels of the knowledge data.
In an optional embodiment, the apparatus further includes a statistical unit, configured to: counting the quantity of knowledge data under each classification label in a historical time period; determining the classification label with the maximum number of corresponding knowledge data, and determining a user label based on the classification label with the maximum number of knowledge data;
the transceiving unit is further configured to send the user tag to the terminal application.
According to a fourth aspect of embodiments of the present application, there is provided a data processing apparatus, the apparatus including:
a recording unit for recording the displayed text data in response to a data processing request of a user;
the processing unit is used for determining knowledge data and classification labels of the knowledge data, wherein the knowledge data and the classification labels of the knowledge data are determined by segmenting the text data to obtain segmented data and comparing the segmented data with prestored data in a knowledge base;
and the display unit is used for displaying the knowledge data under the classification label of the knowledge data.
In an optional embodiment, the display unit is specifically configured to:
and responding to a knowledge display request of a user, displaying the knowledge data under the classification label of the knowledge data, and simultaneously displaying an access link of the text data corresponding to the knowledge data.
In an optional embodiment, the processing unit is specifically configured to:
determining valid data from the text data;
for any effective data, fragmenting the effective data to obtain fragmented data;
and comparing the fragment data with pre-stored data in the knowledge base, determining knowledge data from all the fragment data, and determining the classification label of each knowledge data.
According to a fifth aspect of embodiments herein, there is provided a computing device comprising at least one processor, and at least one memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the data processing method provided herein.
According to a sixth aspect of the embodiments of the present application, there is provided a storage medium storing computer instructions, which, when run on a computer, cause the computer to perform the steps of the data processing method provided by the embodiments of the present application.
In the embodiment of the application, the terminal application responds to a request of a user, records the displayed text data and the user behavior data of each piece of text data, and sends the text data to the server. And the server fragments the text data to obtain fragmented data. The server compares the fragment data with pre-stored data in a knowledge base, determines knowledge data from the fragment data, and determines a classification label of each knowledge data. And the server stores the knowledge data and the corresponding classification labels and sends the knowledge data and the corresponding classification labels to the terminal application. So that the terminal application can display the received knowledge data and the classification tags of the knowledge data to the user. In the embodiment of the application, the read text of the user is recorded, the read content of the user is subjected to statistical analysis from the perspective of a knowledge system, the read knowledge points of the user are summarized and presented to the user under the classification of the knowledge system, and the user can quickly and systematically find the acquired knowledge points after the reading action. In addition, the embodiment of the application enables the user to intuitively perceive that valuable knowledge is obtained in reading, so that the satisfaction and the value of the user in reading are improved, the phenomenon that time is wasted in reading the information flow product is avoided, and the viscosity of the user and the reading time of the user on the information flow product are improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application.
FIG. 1 is a system architecture diagram of a data processing system in an embodiment of the present application;
fig. 2 is a schematic diagram of a possible interface of a terminal in an embodiment of the present application;
FIG. 3 is a flow chart of a data processing method in an embodiment of the present application;
fig. 4a to 4f are schematic diagrams of a display interface of a terminal according to an embodiment of the present application;
FIG. 5 is a flowchart of a data processing method according to a first embodiment of the present application;
fig. 6 is a flowchart of a data processing method according to a second embodiment of the present application;
fig. 7 is a block diagram showing a structure of a data processing apparatus according to an embodiment of the present application;
FIG. 8 is a block diagram showing the structure of another data processing apparatus according to the embodiment of the present application;
fig. 9 is a block diagram illustrating a server according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.
Some concepts related to the embodiments of the present application are described below.
Information flow: the recommendation system for information content can record data of users in various ways, draw user pictures through the data, infer interests and hobbies of the users, and recommend the information content which the users are interested in. The information flow product is the information content recommended to the user by the system.
Knowledge system: comprehensive knowledge of one or more domain systems, such as educational knowledge systems (languages, history, mathematics, geography, etc.), such as enterprise knowledge systems (management, economy, policy, etc. related), which can help us solve problems in life and work, can be used to evaluate people's learning, cognition and understanding abilities.
Text data: the information content pushed and displayed by the terminal application to the user generally includes text data, picture data, video data, and the like. The information content referred to in the embodiments of the present application is text data.
Reading the content: the content that the user has seen in the information stream of the terminal application, i.e. the text data that the terminal application displays to the user in the embodiment of the present application.
User behavior: all data (including download amount, use frequency, access amount, access rate, retention time and the like) in the process of using the information stream product by the user mainly relate to reading time, reading duration and the like of the user in the embodiment of the application.
Valid data: and screening effective data, namely the text data with the duration exceeding a time threshold value displayed by the terminal application, from all the text data according to the user behavior data corresponding to the text data. In short, the valid data is an article that the user actually reads.
Slicing data: generally, a piece of text can be segmented according to a sentence unit, and thus one piece of segmented data is a sentence.
Knowledge data: and acquiring the knowledge points by the user in the process of browsing the articles in the terminal application. Knowledge points are also stored in the knowledge base, typically under some sort label of the knowledge hierarchy.
Referring to FIG. 1, a system architecture diagram of a data processing system, including a client 101 and a server 102, is shown according to an embodiment of the present application. The client 101 is an Application program (APP); the server 102 is a server corresponding to the client 101. The user can log in the client 101 with his own account information.
The client 101 is installed in the terminal 103. The terminal 103 may be an electronic device with a wireless communication function, such as a mobile phone, a tablet computer, or a dedicated handheld device, and may also be a device connected to the internet in a wired access manner, such as a Personal Computer (PC), a notebook computer, or a server. Server 102 may be a computer or other network device. The server 102 may be a stand-alone device or a server cluster formed by a plurality of servers. Preferably, the server 102 may employ cloud computing technology for information processing.
The client 101 may communicate with the server 102 through an INTERNET network, or may communicate with the server 102 through a Mobile communication System such as a Global System for Mobile Communications (GSM) System or a Long Term Evolution (LTE) System.
The embodiment of the invention provides a preferred implementation mode, and the terminal is taken as a mobile phone as an example for introduction. Fig. 2 illustrates a possible interface schematic diagram of a terminal, as shown in fig. 2, a plurality of APPs, such as video, clock, call record, information, secure mailbox, mobile phone, S memo, setting, etc., are installed on the terminal. In the embodiment of the application, a client, such as an information flow product APP104, may be installed in a terminal in advance. In the process of browsing the text data in the APP104, the information flow product APP104 records the text data browsed by the user and the user behavior data of each piece of text data, and sends the user behavior data to the server 102. The server 102 determines knowledge data in the content read by the user and the classification label of the knowledge data according to the text data and the user behavior data. After receiving the knowledge data acquisition request sent by the information flow product APP104, the server 102 sends the knowledge data and the classification tags of the knowledge data to the information flow product APP 104. The information flow product APP104 displays the knowledge data acquired after the user reads and the classification labels of the knowledge data to the user.
In addition, the terminal may also complete the above-mentioned determination and pushing process of the knowledge data through the browser, and the specific process is similar to that of the client, which is not described herein again.
It should be noted that the above-mentioned application scenarios are only presented for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, the embodiments of the present application may be applied to any applicable scenario.
The following describes a data processing method provided in the embodiment of the present application with reference to an application scenario shown in fig. 1.
Referring to fig. 3, an embodiment of the present application provides a data processing method, as shown in fig. 3, the method includes:
step S301: the terminal application records the displayed text data in response to a data processing request of a user.
In a specific implementation process, the terminal application may display data processing authorization options to the user, and record the displayed text data in response to a selection of the user in the data processing authorization options. Here, the text data to be displayed may be one or a plurality of copies.
For example, after the user logs in the terminal application, a dialog box as shown in fig. 4a pops up in the display interface. The dialog box includes options for deny and allow. If the user selects to reject, stopping the data processing process in the embodiment of the application, and only pushing and displaying the information flow product to the user; if the user selects permission, the displayed text data and the corresponding user behavior data are recorded while the information flow product is pushed and displayed to the user.
In another embodiment, the terminal application may provide the user with a setting option of whether to perform data processing during reading when the user logs in or registers an account for the first time. If the user selects yes, the terminal application can automatically record the displayed text data and the user behavior data of each piece of text data in the subsequent reading process of the user in the application. And if the user selects no, the process is not executed. The user can also change the setting at will in the subsequent use process, thereby selecting whether to acquire knowledge points.
Step S302: the terminal application sends the text data to the server.
In a specific implementation process, the terminal application may send the text data and the corresponding user behavior data to the server in real time. For example, after the user reads a piece of text data, the terminal application sends the text data and the corresponding user behavior data to the server. The terminal application may also send to the server at a defined frequency, for example every 12 hours, all text data read by the user and corresponding user behavior data. Alternatively, the terminal application may also transmit to the server for a specified period of time, for example, at 23: and when the time is from 00 hours to 24:00 hours, sending all text data read by the user on the current day and corresponding user behavior data to the server.
Step S303: and the server fragments the text data to obtain fragmented data.
Further, in order to ensure the validity of the displayed text data, the step S302 further includes:
and the terminal application sends N parts of text data and user behavior data of each part of text data to the server, wherein N is more than or equal to 1.
Step S303, the server fragments the text data to obtain fragment data, which specifically includes:
according to the user behavior data, effective data are determined from the N text data;
and for any effective data, carrying out fragmentation on the effective data to obtain fragmented data.
In a specific implementation process, the server may fragment the valid data to obtain fragmented data according to the unit of the paragraph, that is, each fragmented data is a paragraph; or the effective data is segmented according to sentences to obtain segmented data, namely each segmented data is a sentence; or the valid data may be fragmented according to words to obtain fragmented data, that is, each fragmented data is a word. In addition, if the document data includes both the chinese text and the english text, a continuous piece of english text is used as one piece of fragment data, for example, the text data shown in fig. 4b, and among them, the english text 31, the english text 32, the english text 33, and the english text 34 can be used as 4 pieces of fragment data.
The specific text fragmentation mode is the prior art, and those skilled in the art can foresee that the details are not described herein.
The user behavior data may include a display duration of the text data and a display time of the text data. The display time of the text data corresponds to the reading time of the text data by the user, and the display time of the text data corresponds to the time point of reading the text data by the user. In a specific implementation process, valid data can be determined from text data based on the reading time of the user or the reading time point of the user.
Preferably, the valid data is determined based on the reading duration of the user. At this time, the user behavior data is the display duration of the text data. In step S303, the server determines valid data from the N text data according to the user behavior data, including:
comparing the display duration of the text data with a duration threshold value, and taking the text data with the display duration larger than the duration threshold value as effective data.
For example, the terminal application sends 5 copies of text data to the server, and the display time periods from 1 st copy of text data to 5 th copy of text data are 2 seconds, 2 minutes, 5 seconds, 1 minute and half minutes, and 10 seconds, respectively. The time length threshold is set to 1 minute, that is, text data having a time length longer than 1 minute is displayed as valid data. The 2 nd and 4 th text data of the above 5 text data are valid data. The remaining text data may be considered as user-mistouched or content that is not of interest to the user. Thereby improving the accuracy of summarizing knowledge points.
Step S304: the server compares the fragment data with pre-stored data in a knowledge base, determines knowledge data from all the fragment data, and determines the classification label of each knowledge data.
The knowledge base of the embodiment of the application stores pre-stored data, can be maintained by a server or other mechanisms, and opens a calling interface to the server of the embodiment of the application.
In the embodiment of the application, the comparison mode between the sliced data and the pre-stored data can be text similarity matching, that is, the semantic similarity between the sliced data and the pre-stored data is compared by using a text matching algorithm. And taking the fragment data with the similarity greater than the threshold as knowledge data. Meanwhile, the classification label of the knowledge data is determined according to the classification label of the pre-stored data in the knowledge base. Here, the text matching algorithm is the prior art, and is not described here in detail.
Step S305: and the server sends the knowledge data and the classification labels of the knowledge data to the terminal application.
In the embodiment of the application, after the server determines the knowledge data and the corresponding classification tags, the knowledge data and the corresponding classification tags are stored. The server can actively send the knowledge data to the terminal application, and can also send the knowledge data to the terminal application after receiving a knowledge data acquisition request sent by the terminal application.
Preferably, the server is passive to send the knowledge data. Namely, the server sends the knowledge data and the classification labels of the knowledge data to the terminal application, and the method comprises the following steps:
receiving a knowledge data acquisition request sent by a terminal application;
and sending a knowledge data acquisition response to the terminal application, wherein the knowledge data acquisition response comprises knowledge data and the classification labels of the knowledge data.
In a specific implementation process, the server can store the knowledge data in a classified manner according to the classification labels. The terminal application may not include a specific classification tag in the knowledge data acquisition request sent to the server, and the server sends all the determined knowledge data and the classification tags of the knowledge data to the terminal application. The knowledge data acquisition request may also include a category label, for example, the knowledge data acquisition request includes a category label as historical knowledge. The server only needs to send the knowledge data with the classification label as the historical knowledge to the terminal application.
Step S306: and the terminal application displays the knowledge data under the classification label of the knowledge data.
In a specific implementation, the terminal application may display an interactive interface as shown in fig. 4c to the user. After the user clicks on the read knowledge option, the terminal application displays different category labels to the user, as shown in fig. 4d, including history, english, data, geography, etc. And when the user clicks the classification label, the terminal application displays the corresponding knowledge data to the user. For example, as shown in fig. 4e, when the user clicks on the history tag in fig. 4d, the terminal application displays the knowledge data with the classification tag as history as shown in fig. 4e to the user in response to the user's selection.
Further, in order to facilitate the user to search for the original text, in step S306, the displaying, by the terminal application, the knowledge data under the classification label of the knowledge data includes:
and responding to a knowledge display request of a user, displaying the knowledge data under the classification label of the knowledge data, and simultaneously displaying an access link of the text data corresponding to the knowledge data.
For example, below each piece of knowledge data, an access link of text data corresponding to the knowledge data may also be attached. Still taking the above fig. 4e as an example, after the user clicks the history tag, the terminal responds to the selection of the user, displays the knowledge data whose classification tag is history to the user, and at the same time, an access link of text data is attached below each piece of knowledge data, and displays the access link in the form of a title corresponding to the original text.
In the embodiment of the application, the terminal application responds to a request of a user, records the displayed N pieces of text data and the user behavior data of each piece of text data, and sends the N pieces of text data and the corresponding user behavior data to the server. And the server determines effective data from the N text data according to the received user behavior data. And for any effective data, carrying out fragmentation on the effective data to obtain fragmented data. The server compares the fragment data with pre-stored data in a knowledge base, determines knowledge data from the fragment data, and determines a classification label of each knowledge data. And the server stores the knowledge data and the corresponding classification labels. And then, the terminal application sends a knowledge data acquisition request to the server, and the server sends a knowledge data acquisition response to the terminal application, wherein the knowledge data acquisition response comprises knowledge data and classification tags of the knowledge data. So that the terminal application can display the received knowledge data and the classification tags of the knowledge data to the user. In the embodiment of the application, the reading content of the user is subjected to statistical analysis from the perspective of a knowledge system by recording the user behavior data of the user, the knowledge points read by the user are summarized and are presented to the user under the classification of the knowledge system, so that the user can quickly and systematically find the acquired knowledge points after the reading behavior. In addition, the embodiment of the application enables the user to intuitively perceive that valuable knowledge is obtained in reading, so that the satisfaction and the value of the user in reading are improved, the phenomenon that time is wasted in reading the information flow product is avoided, and the viscosity of the user and the reading time of the user on the information flow product are improved.
The source of the pre-stored data in the embodiment of the application can be knowledge data in various forms including electronic documents, databases, digital documents, digital books, electronic newspapers and the like in a network. Preferably, the pre-stored data in the knowledge base is stored in a classified mode according to set rules.
The above step S305: the server compares the fragment data with pre-stored data in a knowledge base, determines knowledge data from all the fragment data, and determines the classification label of each knowledge data, including:
aiming at any fragment data, the server carries out similarity matching on the fragment data and all pre-stored data in a knowledge base, determines the highest similarity value and determines the classification of the pre-stored data with the highest similarity value with the fragment data;
and the server takes the fragment data with the highest similarity value larger than the similarity threshold value as the knowledge data, and takes the classification label of the pre-stored data with the highest similarity value with the knowledge data as the classification label of the knowledge data.
In the specific implementation process, the pre-stored data is classified and stored in the knowledge base according to set rules. The set rules can be different knowledge systems, for example, in an educational knowledge system, pre-stored data can be stored in a knowledge base according to different classifications of languages, mathematics, English, history, geography and the like. For example, in the professional knowledge system, the pre-stored data can be stored in the knowledge base according to different categories such as management, science and technology, economy, trade and the like.
After the server fragments the effective data to obtain fragment data, similarity matching is carried out on each fragment data and all pre-stored data. For example, the slice data F is compared with all pre-stored data in the knowledge base to obtain similarity values of 25%, 53%, 84%, 43%, and the like, respectively. The pre-stored data with the highest similarity to the fragment data F is the pre-stored data Y under the history label, and the similarity between the fragment data F and the pre-stored data Y is 84%. If the similarity threshold is set to 80%, the fragment data F is used as knowledge data because the highest similarity value is greater than the similarity threshold, and the classification label history of the pre-stored data Y is used as a classification label of the fragment data F. If the similarity threshold is set to 90%, the fragment data F is not used as knowledge data because the highest similarity value is smaller than the similarity threshold, and the fragment data F can be directly discarded.
Generally, the fragment data with the similarity value smaller than the similarity threshold value with the data prestored in the knowledge base may be considered as being not the knowledge content or being the knowledge content but having a relatively serious error. According to the embodiment of the application, the similarity matching is carried out on the fragment data and the pre-stored data in the knowledge base, so that the knowledge data is determined, and the accuracy of the knowledge data is guaranteed.
In order to facilitate subsequent possible user portrayal or push information to the user based on the reading content of the user, the user label can be determined according to the classification knowledge read by the user most. In one possible embodiment, the method further comprises:
counting the number of knowledge data corresponding to each classification label in a historical time period;
determining the classification label with the maximum number of corresponding knowledge data, and determining a user label based on the classification label with the maximum number of knowledge data;
and sending the user tag to the terminal application.
The historical time period in the embodiment of the present application may be from the start of user registration or first reading to the current time point, may also be from the time when the user first authorizes the terminal application to record display data to the current time point, or may be a unit time period from the current time point to the previous time point, such as the last three months. The server counts the number of the knowledge data corresponding to each classification tag in the historical time period, wherein the number may be a specific number of the knowledge data corresponding to the classification tag, or a percentage of the knowledge data corresponding to the classification tag in all the knowledge data.
Still taking the above-mentioned fig. 4d as an example, it can be seen from the figure that the number of knowledge data corresponding to the history tag accounts for 70% of the number of all knowledge data, the number of knowledge data corresponding to the english tag accounts for 20% of the number of all knowledge data, the number of knowledge data corresponding to the math tag accounts for 50% of the number of all knowledge data, and the number of knowledge data corresponding to the geographical tag accounts for 50% of the number of all knowledge data, and since the classification tag corresponding to the largest number of knowledge data is history, the user tag is determined to be the history knowledge acquaintance based on the history tag. And then, according to the label of the person who reaches the historical knowledge, pushing information related to the history to the user.
Taking fig. 4f as an example, if the number of knowledge data corresponding to the management tag accounts for 70% of the number of all knowledge data, the number of knowledge data corresponding to the science tag accounts for 20% of the number of all knowledge data, the number of knowledge data corresponding to the economic tag accounts for 50% of the number of all knowledge data, and the number of knowledge data corresponding to the trade tag accounts for 50% of the number of all knowledge data, the classification tag having the largest number of corresponding knowledge data is history, and the user tag is determined to be a historical knowledge acquaintance based on the history tag. And subsequently, information related to management can be pushed to the user according to the label of the person who reaches the management knowledge.
In the above embodiment, the data processing flow involves two execution entities, namely, the terminal application and the server. In the scene, the terminal application only needs to record the text data read by the user, and the specific text similarity matching can be processed by the server, so that the storage and calculation pressure of the terminal is reduced.
In addition, the data processing flow in the embodiment of the present application may involve only one execution subject of the terminal application. At this time, the corresponding terminal needs to download the knowledge base in advance, and the terminal needs to execute the processes of the fragmentation of the text data, the comparison of the fragmentation data and the pre-stored data, and the like by itself, so that the terminal has greater requirements on the storage capacity and the calculation capacity of the terminal.
In this scenario, determining knowledge data and classification labels of the knowledge data for terminal applications includes:
determining effective data from the text data;
for any effective data, the effective data is segmented to obtain segmented data;
and comparing the fragment data with pre-stored data in a knowledge base, determining knowledge data from all the fragment data, and determining the classification label of each knowledge data.
The specific execution process of the terminal application is similar to that of the server, and the difference is that the interaction between the terminal application and the server is omitted. Therefore, it is not described in detail here.
The above flow is described in detail below with specific embodiments, and a system architecture of the first embodiment includes a terminal browser and a data processing server. And the user A authorizes the browser of the terminal to record the reading content of the user A through the webpage, so that the fact that the user A reads knowledge in the website B is determined. As shown in fig. 5, the following description will be given by taking a terminal application as a browser.
And the browser responds to the operation of the user A and records the N pieces of text data displayed to the user and the display duration corresponding to each piece of text data.
And the browser sends the N pieces of text data and the display duration of each piece of text data to the server.
And the server compares the display duration of the text data with the duration threshold value, and takes the text data with the display duration greater than the duration threshold value as effective data.
And the server fragments each effective data to obtain fragmented data.
And the server calls an interface of the knowledge base, performs similarity matching on the fragment data and prestored data in the knowledge base aiming at any fragment data, and determines the highest similarity value and the classification of the prestored data with the highest similarity with the fragment data.
And the server takes the fragment data with the highest similarity value larger than the similarity threshold value as the knowledge data, and takes the classification label of the pre-stored data with the highest similarity with the knowledge data as the classification label of the knowledge data.
The server stores the knowledge data and the classification labels of the knowledge data.
And the browser responds to the operation of the user A and sends a knowledge data acquisition request to the server.
The server determines the user label based on the category label for which the amount of knowledge data is the greatest.
And the server sends a knowledge data acquisition response to the browser. The knowledge data acquisition response comprises knowledge data, a classification label of the knowledge data and a user label of the user A.
The browser displays the knowledge data under the classification tags of the knowledge data and displays the user tags. Here, all the knowledge data may be displayed, and the classification label of each knowledge data may be displayed at the same time. It is also possible to display only the knowledge data corresponding to the category label selected by the user a.
The second embodiment relates to a client in a terminal, and the execution subject of the process is the client, as shown in fig. 6, including:
the client side responds to a data processing request of a user, and records N displayed text data and user behavior data of each text data, wherein N is larger than or equal to 1.
The client determines effective data from the N text data according to the user behavior data;
the client divides the effective data into pieces to obtain piece data aiming at any effective data;
the client compares the fragment data with pre-stored data in the knowledge base, determines knowledge data from all the fragment data, and determines the classification label of each knowledge data.
And the client responds to the knowledge display request of the user, displays the knowledge data under the classification label of the knowledge data, and simultaneously displays the access link of the text data corresponding to the knowledge data.
The following are embodiments of the apparatus of the present application, and for details not described in detail in the embodiments of the apparatus, reference may be made to the above-mentioned one-to-one corresponding method embodiments.
Referring to fig. 7, a block diagram of a data processing system according to an embodiment of the present application is shown. The cross-chain data processing apparatus is implemented as all or a portion of the server 102 in fig. 1 by hardware or a combination of hardware and software. The device includes: a transceiver 601, a slicing unit 602, a comparing unit 603, and a counting unit 604.
A transceiving unit 601, configured to receive text data sent by a terminal application, where the text data is text data displayed in the terminal application;
the fragmentation unit 602 is configured to fragment the text data to obtain fragmentation data;
a comparing unit 603, configured to compare the sliced data with pre-stored data in the knowledge base, determine knowledge data from all the sliced data, and determine a classification label of each knowledge data;
the transceiver 601 is further configured to send knowledge data and a classification tag of the knowledge data to the terminal application.
In an optional embodiment, the transceiver 601 is specifically configured to receive N pieces of text data sent by the terminal application and user behavior data of each piece of text data, where N is greater than or equal to 1;
the fragmentation unit 602 is further configured to determine valid data from the N text data according to user behavior data; and for any effective data, carrying out fragmentation on the effective data to obtain fragmented data.
In an optional embodiment, the pre-stored data in the knowledge base is stored in a classified manner according to set rules; the comparison unit 603 is specifically configured to:
for any piece of fragment data, performing similarity matching on the piece of fragment data and all pre-stored data in a knowledge base, determining a highest similarity value, and determining the classification of the pre-stored data with the highest similarity value with the piece of fragment data;
and taking the fragment data with the highest similarity value larger than the similarity threshold value as knowledge data, and taking the classification label of the prestored data with the highest similarity value with the knowledge data as the classification label of the knowledge data.
In an optional embodiment, the transceiver 601 is specifically configured to:
receiving a knowledge data acquisition request sent by the terminal application;
and sending a knowledge data acquisition response to the terminal application, wherein the knowledge data acquisition response comprises the knowledge data and the classification labels of the knowledge data.
In an alternative embodiment, the statistical unit 604 is further configured to: counting the quantity of knowledge data under each classification label in a historical time period; determining the classification label with the maximum number of corresponding knowledge data, and determining a user label based on the classification label with the maximum number of knowledge data;
the transceiver 601 is further configured to send the user tag to the terminal application.
Referring to fig. 8, a block diagram of a data processing system according to an embodiment of the present application is shown. The cross-link data processing apparatus is implemented by hardware or a combination of hardware and software as all or a part of the terminal 103 in fig. 1. The device includes: a recording unit 701, a processing unit 702, and a display unit 703.
A recording unit 701 for recording the displayed text data in response to a data processing request of a user;
a processing unit 702, configured to determine knowledge data and a classification label of the knowledge data, where the knowledge data and the classification label of the knowledge data are determined by segmenting the text data to obtain segmented data and comparing the segmented data with pre-stored data in a knowledge base;
a display unit 703, configured to display the knowledge data under the classification label of the knowledge data.
In an optional embodiment, the display unit 703 is specifically configured to:
and responding to a knowledge display request of a user, displaying the knowledge data under the classification label of the knowledge data, and simultaneously displaying an access link of the text data corresponding to the knowledge data.
In an alternative embodiment, the processing unit 702 is specifically configured to:
determining valid data from the text data;
for any effective data, fragmenting the effective data to obtain fragmented data;
and comparing the fragment data with pre-stored data in the knowledge base, determining knowledge data from all the fragment data, and determining the classification label of each knowledge data.
Referring to fig. 9, a block diagram of a server according to an embodiment of the present application is shown. The server 800 is implemented as the server 102 in fig. 1. Specifically, the method comprises the following steps:
the server 800 includes a Central Processing Unit (CPU)801, a system memory 804 including a Random Access Memory (RAM)802 and a Read Only Memory (ROM)803, and a system bus 805 connecting the system memory 804 and the central processing unit 801. The server 800 also includes a basic input/output system (I/O system) 806, which facilitates transfer of information between devices within the computer, and a mass storage device 807 for storing an operating system 813, application programs 814, and other program modules 815.
The basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse, keyboard, etc. for user input of information. Wherein the display 808 and the input device 809 are connected to the central processing unit 801 through an input output controller 810 connected to the system bus 805. The basic input/output system 806 may also include an input/output controller 810 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 810 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 807 and its associated computer-readable media provide non-volatile storage for the server 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 804 and mass storage 807 described above may be collectively referred to as memory.
The server 800 may also operate as a remote computer connected to a network via a network, such as the internet, according to various embodiments of the present application. That is, the server 800 may be connected to the network 812 through the network interface unit 811 coupled to the system bus 805, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 811.
The memory also includes one or more programs stored in the memory, the one or more programs including instructions for performing the check-in methods provided by embodiments of the present application.
It will be understood by those skilled in the art that all or part of the steps in the check-in method of the above embodiments may be implemented by a program instructing associated hardware, and the program may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Those skilled in the art will appreciate that all or part of the steps in the check-in method of the above embodiments may be implemented by a program instructing associated hardware, and the program may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method of data processing, the method comprising:
receiving text data sent by a terminal application, wherein the text data is text data displayed in the terminal application;
segmenting the text data to obtain segmented data;
comparing the fragment data with pre-stored data in a knowledge base, determining knowledge data from all the fragment data, and determining a classification label of each knowledge data;
and sending the knowledge data and the classification labels of the knowledge data to the terminal application.
2. The method of claim 1, wherein receiving the text data sent by the terminal application comprises:
receiving N parts of text data sent by the terminal application and user behavior data of each part of text data, wherein N is more than or equal to 1;
the slicing the text data to obtain sliced data includes:
according to the user behavior data, effective data are determined from the N text data;
and for any effective data, carrying out fragmentation on the effective data to obtain fragmented data.
3. The method of claim 1, wherein the pre-stored data in the knowledge base are classified and stored according to set rules; the step of comparing the fragment data with pre-stored data in a knowledge base, determining knowledge data from all the fragment data, and determining the classification label of each knowledge data comprises the following steps:
for any piece of fragment data, performing similarity matching on the piece of fragment data and all pre-stored data in a knowledge base, determining a highest similarity value, and determining the classification of the pre-stored data with the highest similarity value with the piece of fragment data;
and taking the fragment data with the highest similarity value larger than the similarity threshold value as knowledge data, and taking the classification label of the prestored data with the highest similarity value with the knowledge data as the classification label of the knowledge data.
4. The method of claim 1, wherein said sending the knowledge data and the class labels for the knowledge data to the terminal application comprises:
receiving a knowledge data acquisition request sent by the terminal application;
and sending a knowledge data acquisition response to the terminal application, wherein the knowledge data acquisition response comprises the knowledge data and the classification labels of the knowledge data.
5. The method according to any one of claims 1 to 4, wherein after receiving the knowledge data display request sent by the terminal application, the method further comprises:
counting the quantity of knowledge data under each classification label in a historical time period;
determining the classification label with the maximum number of corresponding knowledge data, and determining a user label based on the classification label with the maximum number of knowledge data;
and sending the user tag to the terminal application.
6. A method of data processing, the method comprising:
the terminal application responds to a data processing request of a user and records the displayed text data;
determining knowledge data and a classification label of the knowledge data, wherein the knowledge data and the classification label of the knowledge data are determined by fragmenting the text data to obtain fragment data and comparing the fragment data with prestored data in a knowledge base;
and displaying the knowledge data under the classification label of the knowledge data.
7. The method of claim 6, wherein the determining knowledge data and class labels for the knowledge data comprises:
determining valid data from the text data;
for any effective data, fragmenting the effective data to obtain fragmented data;
and comparing the fragment data with pre-stored data in the knowledge base, determining knowledge data from all the fragment data, and determining the classification label of each knowledge data.
8. A data processing apparatus, characterized in that the apparatus comprises:
the terminal comprises a receiving and sending unit, a processing unit and a processing unit, wherein the receiving and sending unit is used for receiving text data sent by a terminal application, and the text data is text data displayed in the terminal application;
the fragmentation unit is used for fragmenting the text data to obtain fragmentation data;
the comparison unit is used for comparing the fragment data with prestored data in a knowledge base, determining knowledge data from all the fragment data and determining a classification label of each knowledge data;
the transceiving unit is further configured to send the knowledge data and the classification label of the knowledge data to the terminal application.
9. A data processing apparatus, characterized in that the apparatus comprises:
a recording unit for recording the displayed text data in response to a data processing request of a user;
the processing unit is used for determining knowledge data and classification labels of the knowledge data, wherein the knowledge data and the classification labels of the knowledge data are determined by segmenting the text data to obtain segmented data and comparing the segmented data with prestored data in a knowledge base;
and the display unit is used for displaying the knowledge data under the classification label of the knowledge data.
10. A computer storage medium having computer-executable instructions stored thereon for performing the time alignment method of any one of claims 1 to 5.
CN201910819017.2A 2019-08-30 2019-08-30 Data processing method and device and computer storage medium Pending CN110597989A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910819017.2A CN110597989A (en) 2019-08-30 2019-08-30 Data processing method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910819017.2A CN110597989A (en) 2019-08-30 2019-08-30 Data processing method and device and computer storage medium

Publications (1)

Publication Number Publication Date
CN110597989A true CN110597989A (en) 2019-12-20

Family

ID=68856625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910819017.2A Pending CN110597989A (en) 2019-08-30 2019-08-30 Data processing method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN110597989A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886594A (en) * 2017-02-21 2017-06-23 北京百度网讯科技有限公司 For the method and apparatus of exhibition information
CN109766422A (en) * 2018-12-29 2019-05-17 上海智臻智能网络科技股份有限公司 Information processing method, apparatus and system, storage medium, terminal
CN109977312A (en) * 2019-03-27 2019-07-05 安庆师范大学 A kind of knowledge base recommender system based on content tab

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106886594A (en) * 2017-02-21 2017-06-23 北京百度网讯科技有限公司 For the method and apparatus of exhibition information
CN109766422A (en) * 2018-12-29 2019-05-17 上海智臻智能网络科技股份有限公司 Information processing method, apparatus and system, storage medium, terminal
CN109977312A (en) * 2019-03-27 2019-07-05 安庆师范大学 A kind of knowledge base recommender system based on content tab

Similar Documents

Publication Publication Date Title
CN107679211B (en) Method and device for pushing information
US20200226182A1 (en) Forming a document collection in a document management and collaboration system
US10031975B2 (en) Presentation of search results based on the size of the content sources from which they are obtained
US20190138653A1 (en) Calculating relationship strength using an activity-based distributed graph
CN107577807B (en) Method and device for pushing information
US20150324448A1 (en) Information Recommendation Processing Method and Apparatus
US20160085740A1 (en) Generating training data for disambiguation
US10339222B2 (en) Information providing system, information providing method, non-transitory recording medium, and data structure
US20080059544A1 (en) System and method for providing secure third party website histories
US10078656B1 (en) Unmodifiable data in a storage service
US20200110733A1 (en) Criterion-based retention of data object versions
US20170357987A1 (en) Online platform for predicting consumer interest level
US20210374339A1 (en) Generating sentiment analysis of content
CN107632971B (en) Method and device for generating multidimensional report
US20210037353A1 (en) Identifying decisions and rendering decision records in a group-based communication interface
US10372782B1 (en) Content generation and experimentation using engagement tests
US20110138000A1 (en) Applying tags from communication files to users
CN110851582A (en) Text processing method and system, computer system and computer readable storage medium
TWI575391B (en) Social data filtering system, method and non-transitory computer readable storage medium of the same
US20180046683A1 (en) Search word list providing device and method using same
US20230281306A1 (en) System and method for detecting leaked documents on a computer network
US11126520B2 (en) Skew detector for data storage system
US20140189886A1 (en) Template For Customer Attributes
US20060265383A1 (en) Method and system for performing and sorting a content search
CN107920100B (en) Information pushing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40019480

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination