CN113407714A - Data processing method and device based on aging, electronic equipment and storage medium - Google Patents

Data processing method and device based on aging, electronic equipment and storage medium Download PDF

Info

Publication number
CN113407714A
CN113407714A CN202011217879.7A CN202011217879A CN113407714A CN 113407714 A CN113407714 A CN 113407714A CN 202011217879 A CN202011217879 A CN 202011217879A CN 113407714 A CN113407714 A CN 113407714A
Authority
CN
China
Prior art keywords
content
aging
time
category
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011217879.7A
Other languages
Chinese (zh)
Other versions
CN113407714B (en
Inventor
石磊
马连洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011217879.7A priority Critical patent/CN113407714B/en
Publication of CN113407714A publication Critical patent/CN113407714A/en
Application granted granted Critical
Publication of CN113407714B publication Critical patent/CN113407714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data processing method and device based on aging, electronic equipment and a storage medium, and relates to the technical field of cloud technology and artificial intelligence. The method comprises the following steps: acquiring content to be processed, wherein the content to be processed comprises text content; determining text characteristics of the text content; determining a first time-effect category of the content to be processed according to the text characteristics of the text content; if the first time effect type is the first type, determining the time effect of the content to be processed based on the time effect corresponding to the first type; if the first time effect category is a second category, determining the time effect of the content to be processed based on the time key words in the text content, wherein the time effect corresponding to the second category is larger than the time effect corresponding to the first category; and processing according to the aging of the content to be processed. According to the embodiment of the application, the accuracy of the timeliness of the content to be processed is improved, the information recommended to the user by the application program is valid information which is not outdated, and the user experience is improved.

Description

Data processing method and device based on aging, electronic equipment and storage medium
Technical Field
The application relates to the technical field of cloud technology and artificial intelligence, in particular to a data processing method and device based on aging, an electronic device and a storage medium.
Background
At present, most of application programs have the information recommendation function, and each piece of information has the corresponding aging, so that the aging of the information is accurately determined, and the outdated information recommended by the application programs can be prevented to a great extent.
In the prior art, the time limit of an article may be determined according to a time keyword included in the article, specifically, if the article includes the time keyword, the article is a short-time-limit article, if the article does not include the time keyword, the article is a long-time-limit article, and the time keyword may be the latest, the last several days, the period of time, and the like. In addition, the aging of the article can be determined according to the type of the article, for example, the aging of a sports article is 3 days, the aging of a movie article is 7 days, and the like. Therefore, the existing mode for determining article timeliness is rough, so that article timeliness is prone to being inaccurate, and further, outdated information recommended to a user by an application program is caused, and user experience is affected.
Disclosure of Invention
The application provides a data processing method and device based on aging, electronic equipment and a storage medium, wherein the data processing method and device based on aging can accurately determine article aging.
In a first aspect, a method for data processing based on aging is provided, the method comprising:
acquiring content to be processed, wherein the content to be processed comprises text content;
determining text characteristics of the text content;
determining a first time-effect category of the content to be processed according to the text characteristics of the text content;
if the first time effect type is the first type, determining the time effect of the content to be processed based on the time effect corresponding to the first type;
if the first time effect category is a second category, determining the time effect of the content to be processed based on the time key words in the text content, wherein the time effect corresponding to the second category is larger than the time effect corresponding to the first category;
and processing according to the aging of the content to be processed.
In a second aspect, there is provided an age-based data processing apparatus, the apparatus comprising:
the content acquisition module is used for acquiring the content to be processed, and the content to be processed comprises text content;
the aging category determining module is used for determining the text characteristics of the text content and determining a first aging category of the content to be processed according to the text characteristics of the text content;
the aging determining module is used for determining the aging of the content to be processed based on the aging corresponding to the first category when the first aging category is the first category, and determining the aging of the content to be processed based on the time key words in the text content when the first aging category is the second category, wherein the aging corresponding to the second category is larger than the aging corresponding to the first category;
and the content processing module is used for processing according to the aging of the content to be processed.
In one possible implementation, the apparatus further includes a keyword extraction module;
the keyword extraction module is used for extracting time keywords in the text content and context information of the time keywords;
the aging determining module is specifically configured to, when determining the aging of the content to be processed based on the time keyword in the text content:
determining a second aging category of the content to be processed according to the time keyword and the context information of the time keyword, wherein the second aging category is the first category or the second category;
and determining the aging of the content to be processed based on the aging corresponding to the second aging category.
In a possible implementation manner, when determining the second aging category of the content to be processed according to the time keyword and the context information of the time keyword, the aging determination module is specifically configured to:
extracting the characteristics of the time keywords and the text characteristics of the context information of the time keywords;
and determining a second aging category of the content to be processed according to the characteristics of the time keywords and the text characteristics of the context information of the time keywords.
In a possible implementation manner, when the number of the time keywords is at least two, the aging determination module is specifically configured to, when determining the second aging category of the content to be processed according to the time keywords and the context information of the time keywords:
for each time keyword, determining an aging category corresponding to the time keyword according to the time keyword and the context information of the time keyword;
when the aging categories corresponding to the time keywords are the second categories, determining that the second aging category of the content to be processed is the second category;
and when at least one aging category exists in the aging categories corresponding to the time keywords and is the first category, determining that the second aging category of the content to be processed is the first category.
In one possible implementation, the context information of the time keyword includes at least one of:
the target sentence where the time keyword is located is at least one sentence which is located in front of the target sentence and is adjacent to the target sentence, and the at least one sentence which is located behind the target sentence and is adjacent to the target sentence.
In a possible implementation manner, the aging determining module is specifically configured to, when extracting the time keyword in the text content:
determining and extracting time keywords in the text content according to the text content and a pre-constructed keyword library;
the keyword library is constructed in the following way:
acquiring at least one seed time keyword;
acquiring each candidate word;
determining a target time keyword in each candidate word based on the similarity between each candidate word and the seed time keyword;
and constructing a keyword library based on the various sub-time keywords and the target time keywords.
In one possible implementation, the content obtaining module is further configured to:
acquiring the content category of the content to be processed;
the aging determination module is specifically configured to, when determining the aging of the content to be processed based on the aging corresponding to the first category:
determining the time efficiency of the content to be processed based on the time efficiency corresponding to the first category and the time efficiency corresponding to the content category;
the aging determining module is specifically configured to, when determining the aging of the content to be processed based on the time keyword in the text content:
and determining the time efficiency of the content to be processed based on the time keywords in the text content and the time efficiency corresponding to the content category.
In a possible implementation manner, the text content includes a title and a body, and the aging category determining module is specifically configured to, when determining the text feature of the text content:
extracting text features of the title and text features of the text;
and fusing the text features of the title and the text features of the text to obtain the text features of the text content.
In a possible implementation manner, the aging category determining module is specifically configured to, when fusing the text feature of the title and the text feature of the body to obtain the text feature of the text content:
and splicing the text features of the title and the text features of the text to obtain the text features of the text content.
In one possible implementation, the content to be processed is recommended content;
the content processing module is specifically configured to:
determining a recommended time at which to recommend content;
and when the difference value between the recommended time and the time efficiency of the recommended content is not less than the set value, deleting the recommended content.
In a third aspect, an electronic device is provided, comprising a memory and a processor, wherein the memory has stored therein a computer program; the processor, when running the computer program, performs the aging-based data processing method as shown in the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored, wherein the computer program, when being executed by a processor, implements the aging-based data processing method according to the first aspect.
The beneficial effect that technical scheme that this application provided brought is:
the application provides a data processing method and device based on aging, an electronic device and a storage medium, compared with the prior art, the method and the device determine the aging category of the content to be processed according to the text characteristics of the text content in the content to be processed, if the aging category of the content to be processed is the first category, determining the aging of the content to be processed based on the aging corresponding to the first category, if the aging category of the content to be processed is the second category, determining the aging of the content to be processed based on the time key words in the text content, namely, the aging of the content to be processed is determined according to the text characteristics of the whole text content in the content to be processed, the accuracy of the aging of the content to be processed is greatly improved, and further, and corresponding processing is carried out according to the time efficiency of the content to be processed, so that the information recommended to the user by the application program is valid information without outdated, and the user experience is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of an aging-based data processing method according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of determining a target time keyword according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an information recommendation system according to an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating an example of determining the timeliness of an information article;
FIG. 5 is a schematic diagram of another embodiment of the present application for determining the age of an information article;
FIG. 6 is a schematic structural diagram of an aging-based data processing apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
In each embodiment of the application, in any scene, the aging corresponding to the long aging type is larger than the aging corresponding to the short aging type.
It should be noted that, in different scenarios, the aging time corresponding to the long-term effect type may be different, for example, in the information recommendation application, the aging time corresponding to the long-term effect type may be 30 days, and in the stock application, the aging time corresponding to the long-term effect type may be only several hours. Correspondingly, in different scenarios, the aging corresponding to the short aging type may also be different, for example, in the information recommendation application, the aging corresponding to the short aging type may be 3 days, while in the stock application, the aging corresponding to the short aging type may be only a few minutes.
In practical application, the aging corresponding to the long aging type and the aging corresponding to the short aging type in a scene may be set in combination with the characteristics of the scene, which is not limited herein.
In the prior art, whether the aging of the article is determined according to the time keyword or the content type of the article is relatively rough. The articles in one scene are rich and diverse in literary composition, the context of the articles is complicated, and the aging of the articles is determined by a monotonous strategy, so that the articles which are originally long-term effective are short-aging-type articles, and cannot be continuously recommended at a recommending side, or the articles which are originally short-aging-type articles are long-aged, and the time-effect results of the articles are lengthened, so that the number of the effective articles at the recommending side is reduced, and a user can brush new articles, and the user experience is influenced.
For example, the word "previous period" corresponds to two sentences. The first method comprises the following steps: front section time, little editorial found a landmark building on the center street, second: in the former period, company A director published a message at a release. The aging category of the first sentence is a long-aging category, so the word of 'front-segment time' in the first sentence should not be taken as an effective word, i.e. the 'front-segment time' should not be taken as a basis for judging the aging result; the aging category of the second sentence is a short aging category, so the "previous period time" in the second sentence should be used as a valid time keyword, i.e. the "previous period time" should be used as a basis for judging the aging result. In the prior art, when the timeliness of the article is determined according to the time keywords, the two sentences are marked as short-timeliness sentences.
Therefore, the method for determining the aging of the article by adopting the prior art is rough, the determined aging result of the article is easy to be inaccurate, and based on the problems, the embodiment of the application provides the data processing method based on the aging, so that the problems in the prior art can be effectively solved.
Optionally, each optional embodiment provided by the present application may be implemented based on a cloud technology, data processing/computation related to implementation of the scheme may be implemented by using cloud computing, and the obtained content to be processed and each intermediate product, such as a text feature, may be stored in a cloud storage manner, or may be stored in a database based on the cloud technology.
Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
Cloud computing (cloud computing) is a computing model that distributes computing tasks over a pool of resources formed by a large number of computers, enabling various application systems to obtain computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.
As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.
According to the logic function division, a PaaS (Platform as a Service) layer can be deployed on an IaaS (Infrastructure as a Service) layer, a SaaS (Software as a Service) layer is deployed on the PaaS layer, and the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms, and mass texting. Generally speaking, SaaS and PaaS are upper layers relative to IaaS.
Cloud computing (cloud computing) refers to a delivery and use mode of an IT infrastructure, and refers to obtaining required resources in an on-demand and easily-extensible manner through a network; the generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), distributed Computing (distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.
With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.
A distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work by using functions such as cluster application, grid technology, and a distributed storage file system, and provides a data storage function and a service access function to the outside.
At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data identification (ID, ID entry), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object.
The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.
Database (Database), which can be regarded as an electronic file cabinet in short, a place for storing electronic files, a user can add, query, update, delete, etc. to data in files. A "database" is a collection of data that is stored together in a manner that can be shared by multiple users, has as little redundancy as possible, and is independent of the application.
A Database Management System (DBMS) is a computer software System designed for managing a Database, and generally has basic functions of storage, interception, security assurance, backup, and the like. The database management system may classify the database according to the database model it supports, such as relational, XML (Extensible Markup Language); or classified according to the type of computer supported, e.g., server cluster, mobile phone; or sorted according to the Query Language used, such as SQL (Structured Query Language), XQuery, or sorted according to performance impulse emphasis, such as max size, maximum operating speed, or other sorting.
The embodiments in the present application may be implemented based on an artificial intelligence technology, for example, determining text features of text content by using the artificial intelligence technology, and determining a first time class and a second time class of content to be processed.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
The embodiment of the application provides a data processing method based on aging, which can be executed by electronic equipment, wherein the electronic equipment can be a terminal device such as a mobile phone, a desktop computer, a notebook computer, a tablet computer, and the like, and can also be a server or a server cluster, and the server can be a physical server, and can also be the aforementioned cloud server, and the like.
Specifically, fig. 1 is a schematic flowchart of a data processing method based on aging according to an embodiment of the present application. Alternatively, the method may be performed by a server, such as a server of an application program, as shown in fig. 1, the method comprising steps S101-S105.
Step S101, obtaining the content to be processed, wherein the content to be processed comprises text content.
The content to be processed may include at least one of text, picture, video, audio, and the like, and the text content may be included in the picture, the video, and the like. For example, the content to be processed may be an article, or may be information.
In the embodiment of the application, text content can be extracted from the content to be processed, and the extracted text content comprises at least one of a plain text, a text in a video, a text in a picture and the like.
Step S102, determining the text characteristics of the text content.
Wherein the text feature is a feature vector for representing text content. The specific manner of determining the text features is not limited in the embodiments of the present application, and may be implemented by using a pre-trained neural network model.
As an optional manner, the text content includes a title and a body, and step S102, determining the text feature of the text content may specifically include: extracting text features of the title and text features of the text; and fusing the text features of the title and the text features of the text to obtain the text features of the text content.
In the embodiment of the application, a word bank can be constructed in advance, and the text characteristics of the title and the text characteristics of the text can be obtained by using the word bank.
Specifically, a large number of text contents can be obtained in advance, words can be segmented for each text content to obtain a large number of words, word vectors of each word can be obtained by using a trained word vector model, and a word bank can be constructed by using the word vectors of each word, wherein the dimensionality of the word vectors is not limited.
For example, 100 ten thousand articles are obtained in advance, 100 ten thousand articles are subjected to Word segmentation to obtain 200 ten thousand words, Word vectors of the 200 ten thousand words are obtained by using a trained Word to vector (Word to vector) model, wherein the Word vector of each Word has 100 dimensions, and a Word library is constructed by using the Word vectors of the 200 ten thousand words.
When the title comprises one word, the word vector of the title can be determined by using the word bank, the word vector of the title is used as the text characteristic of the title, when the title comprises at least two words, the title can be segmented to obtain each word, the word vector of each word is determined by using the word bank, and the word vectors of each word are fused to obtain the text characteristic of the title. During fusion, the word vectors of all the words can be accumulated and normalized.
In general, a text includes at least two words, the text can be segmented into words to obtain each word, word vectors of each word are determined by using a word bank, and the word vectors of each word are fused to obtain text features of the text. During fusion, the word vectors of all the words can be accumulated and normalized.
The dimension of the text feature obtained by accumulating and normalizing the word vectors of all the words is the same as the dimension of the word vector of any word. For example, two 100-dimensional word vectors are accumulated and normalized to obtain a 100-dimensional feature vector.
Further, the text features of the title and the text features of the body can be fused to obtain the text features of the text content, wherein the fusion mode is not limited in the embodiment of the application.
As an optional implementation manner, the text features of the title and the text features of the body are fused to obtain the text features of the text content, which may specifically include: and splicing the text features of the title and the text features of the text to obtain the text features of the text content.
In the embodiment of the application, the text features of the titles can be spliced before or after the text features of the texts to obtain the text features of the text contents.
Step S103, determining a first time effect category of the content to be processed according to the text characteristics of the text content.
In the embodiment of the present application, the specific classification manner of the aging classification is not limited, and the aging classification may be classified into two aging classifications, three aging classifications, or even more aging classifications, for example, the aging classification may be classified into three aging classifications, which are respectively denoted as aging classification 1, aging classification 2, and aging classification 3.
The first time class of the content to be processed may be determined according to the text feature of the text content, and the determination manner is not limited, where the first time class may be one of at least two time classes, for example, when the time class is divided into a time class 1, a time class 2, and a time class 3, the first time class may be a time class 1, a time class 2, or a time class 3.
For convenience of description, the embodiments of the present application and the embodiments related below will be described with two aging categories, that is, the first aging category may be a first category or a second category, where the aging corresponding to the second category is greater than the aging corresponding to the first category. In order to facilitate to clarify the aging relationship between the second category and the first category, in the embodiments of the present application and the embodiments related to the following description, the second category is referred to as a long-aging category, and the first category is referred to as a short-aging category.
As an optional implementation manner, the text features of the text content may be input into a trained aging classification model (here, the aging classification model may be referred to as a first aging classification model) as input information, the aging classification model outputs an aging class, and the aging class output by the aging classification model is used as a first aging class of the content to be processed.
The embodiment of the present application is not limited, and any existing classification model can be trained to implement the specific model architecture of the aging classification model. Wherein different classification models may be trained for different application scenarios. The classification model of two classes or the classification model of multiple classes may be selected according to the actual requirements of the application scenario, which is not limited in the embodiment of the present application.
In the embodiment of the present application, two aging categories will be described, in this case, the aging classification model may be a two-classification model, and the aging category output by the two-classification model may be a long aging category or a short aging category. The binary model may be an eXtreme Gradient Boosting (XGBoost) model.
A training sample set of the aging classification model can be constructed in advance, and the initial model is trained by utilizing the constructed training sample set to obtain the aging classification model. When the training sample set is constructed, a large amount of text contents can be collected in advance, the aging categories of the collected text contents are labeled manually, and the labeled text contents are utilized to construct the training sample set.
For example, 100 ten thousand text contents are collected in advance, for each text content, the aging category of the text content is labeled as a long-aging category or a short-aging category, the 100 ten thousand labeled articles form a training sample set, and an initial model is trained to obtain two classification models.
Of course, determining the first time class of the content to be processed by using the time-efficient classification model according to the text feature of the text content is only one possible implementation manner, and in actual implementation, determining the time-efficient class of the content to be processed by using the time-efficient classification model may not be used, but may be implemented by using other manners, which is not limited herein.
And step S104, determining the aging of the content to be processed according to the first aging category.
The time efficiency corresponding to the first time efficiency category can be determined as the time efficiency of the content to be processed. For example, if the aging corresponding to the long aging category is 30 days, and the aging corresponding to the short aging category is 3 days, the aging of the content to be processed is 30 days if the first aging category is the long aging category; when the first aging category is a short aging category, the aging of the content to be processed is 3 days.
Of course, in practical application, the time efficiency of the content to be processed may also be determined according to the first time efficiency category and the related information of the content to be processed.
As an optional implementation manner, the related information of the content to be processed may include a content category of the content to be processed, and the minimum value of the time period corresponding to the first time period category and the time period corresponding to the content category may be determined as the time period of the content to be processed.
For example, if the first time effect category is a long time effect category, the time effect corresponding to the long time effect category is 30 days, the content category of the content to be processed is a movie category, and the time effect corresponding to the movie category is 7 days, the time effect of the content to be processed is determined to be 7 days.
Specifically, step S104 determines the aging of the content to be processed according to the first aging category, and specifically may include step S1041 and step S1042.
Step S1041, if the first aging category is the first category, determining an aging of the content to be processed based on an aging corresponding to the first category.
In this embodiment, the first category may be a short-term category, and a term corresponding to the short-term category may be preset, and when the first term category of the content to be processed is the short-term category, the term of the content to be processed is determined as the term corresponding to the preset short-term category.
For example, the aging corresponding to the short aging category is preset to be 3 days, and when the first aging category of one piece of information is the short aging category, the aging of the information is 3 days.
In the embodiment of the application, if the first time effect category is a short time effect category, it is indicated that the time effect category of the content to be processed is the short time effect category, and since the time effect of the content to be processed is short, it is not necessary to further determine the time effect category of the content to be processed by using a time keyword, so that the processing steps can be reduced, and the time required for determining the time effect can be reduced.
Step S1042, if the first time effect category is the second category, determining a time effect of the content to be processed based on the time keyword in the text content, where the time effect corresponding to the second category is greater than the time effect corresponding to the first category.
In this embodiment of the application, the second category may be a long-term category, and when the first term category of the to-be-processed content is the long-term category, the term of the to-be-processed content may be determined based on the time keyword in the text content.
However, the aging corresponding to the long aging type may be set in advance, and normally, the aging corresponding to the long aging type is larger than the aging corresponding to the short aging type, and for example, the aging corresponding to the long aging type is 30 days.
The time keyword is a term related to time, for example, the time keyword may be recent, present, near, previous time, two previous days, on a heat reflection, present, and the like.
And step S105, processing according to the aging of the content to be processed.
In different application scenarios, when corresponding processing is performed according to the aging of the content to be processed, the processing modes may be the same or different.
As an alternative implementation, the application scenario may be a scenario in which the application program has recommended the content to be processed to the user. In this scenario, if the content to be processed is a recommended content, then, in step S105, processing is performed according to the age of the content to be processed, which may specifically include:
determining a recommended time at which to recommend content; if the difference between the recommended time and the aging of the recommended content is not less than the set value (the set value here may be the first set value), the recommended content is deleted.
In the embodiment of the application, recommended content is displayed in a display interface of an application program, the recommended time of the recommended content can be determined under the condition that the application program recommends the recommended content to a user, and the recommended content is deleted if the difference between the recommended time and the aging of the recommended content is not less than a set value.
The set value may be 0 or any positive number.
When the set value is 0 and the difference value between the recommended time and the time effectiveness of the recommended content is not less than 0, the recommended content is deleted, so that the recommended content is not displayed in the display interface of the application program any more, that is, the display interface of the application program does not display the outdated content any more, the content displayed by the application program is ensured to be the content without outdated, that is, the content recommended by the application program is the content without outdated, and the user experience is improved.
When the set value is positive and the difference value between the recommended time and the time effectiveness of the recommended content is not less than the positive number, the recommended content is deleted when the recommended content is about to become obsolete, so that the phenomenon that the application program is not deleted timely due to network, time delay and other factors can be prevented, the application program can delete the obsolete content timely, the content displayed by the application program is ensured to be the content without the obsolete, and the user experience is improved.
For example, the aging time of one piece of information is 16 hours, when the set value is 0, the information can be deleted when the recommended time of the information reaches 16 hours, and when the set value is 10 minutes, the information can be deleted when the recommended time of the information reaches 15 hours and 50 minutes.
As another alternative implementation, the application scenario may be a scenario in which the application program does not recommend the pending content to the user. In general, the scene is a personalized display scene, that is, the content displayed in the display interface of the application program is personalized according to the preference of the user, in the scene, the content to be processed may be the content to be recommended, and then step S105, corresponding processing is performed according to the aging of the content to be processed, which may specifically include:
determining the release time corresponding to the content to be recommended, calculating the time difference between the current time and the release time corresponding to the content to be recommended, and deleting the content to be recommended if the time difference and the timeliness of the content to be recommended are not less than a set value (the set value here can be a second set value).
In the embodiment of the application, the content to be recommended is not displayed in the display interface of the application, the application can correspond to a resource pool, the resource pool comprises a large amount of content to be recommended, and a user can request to update the current recommended content of the application. In the process of updating the current recommended content of the application program, the content to be recommended, which is desired by the user, can be determined from the resource pool, and the determined content to be recommended is recommended to the user, that is, the determined content to be recommended is displayed on the display interface of the application program.
For each content to be recommended in the resource pool, the content to be recommended corresponds to one release time and one aging. The time difference between the current time and the release time corresponding to the content to be recommended can be determined, and when the time difference and the time effectiveness of the content to be recommended are not less than a set value, the content to be recommended in the resource pool is deleted.
The set value can be 0 or any positive number.
When the set value is 0 and the time difference and the time efficiency of the content to be recommended are not less than 0, the content to be recommended is indicated to be outdated, at this time, the content to be recommended in the resource pool is deleted, it can be ensured that the outdated content does not exist in the resource pool, namely, each piece of content to be recommended in the resource pool is the content which is not outdated, so that the content to be recommended determined from the resource pool is ensured to be the content which is not outdated in the process of updating the current recommended content of the application program, the determined content to be recommended displayed on the display interface of the application program is ensured to be the content which is not outdated, and the content which is recommended to the user is prevented from being outdated.
When the set value is a positive number and the time difference value and the time effectiveness of the content to be recommended are not less than the positive number, the content to be recommended is indicated to be about to become obsolete, at the moment, the content to be recommended in the resource pool is deleted, the phenomenon that the application program is not deleted timely due to network, time delay and other factors can be prevented, and the content which is recommended to the user to be obsolete is prevented.
For example, one piece of information is distributed at 2 o 'clock in 10 months, 1 day, and aged at 16 hours, and the information can be deleted when the set value is 0 and the current time is 18 o' clock in 10 months, 1 day, and the information can be deleted when the set value is 10 minutes and the current time is 50 o 'clock in 17 o' clock in 10 months, 1 day.
Compared with the prior art, the embodiment of the application provides a data processing method based on aging, the aging category of the content to be processed is determined according to the text characteristics of the text content in the content to be processed, if the aging category of the content to be processed is the first category, determining the aging of the content to be processed based on the aging corresponding to the first category, if the aging category of the content to be processed is the second category, determining the aging of the content to be processed based on the time key words in the text content, namely, the aging of the content to be processed is determined according to the text characteristics of the whole text content in the content to be processed, the accuracy of the aging of the content to be processed is greatly improved, and further, and corresponding processing is carried out according to the time efficiency of the content to be processed, so that the information recommended to the user by the application program is valid information without outdated, and the user experience is improved.
It should be noted that the context of the text content is rich and varied, and the same words have different meanings represented in different contexts, and the same is applicable to the time-related words.
For example, for the time-related word "most recent", there are two contexts. The first method comprises the following steps: recently, a large-scale yellow-sweeping and blackening and deterring project is developed in the city of China, and breakthrough progress is made. And the second method comprises the following steps: shiguang egoma , before 20 years, I are also a junior middle school student, and recently, I turned over a long-standing old album.
Of these two contexts, the first context is a short age type context and thus "recent" is valid in the first context, while the second context is a long age type context and thus "recent" is not valid in the second context. In the prior art, when the time keyword is adopted to determine the timeliness of an article, as long as the word of 'recent' appears, the article is determined to be a short-timeliness article, and a large number of articles which are supposed to be long-timeliness articles can be mistakenly killed to a certain extent, so that some high-quality long-timeliness articles cannot be effectively recommended on a recommending side, and the content of the recommending side is influenced.
According to the method and the device, the time keywords and the context information of the time keywords in the text content can be extracted, the time keywords and the context information thereof are integrated, and the aging category of the context information of the time keywords is accurately identified. The time keyword is classified according to the text characteristics of the context information of the time keyword, and the time keyword is classified according to the text characteristics of the context information of the time keyword.
Another possible implementation manner of the embodiment of the present application, step S1042, namely determining the time efficiency of the content to be processed based on the time keyword in the text content, may further include step S106, extracting the time keyword and the context information of the time keyword in the text content.
In the embodiment of the application, the second category may be a long-term effectiveness category, and when the first term effectiveness category of the content to be processed is the long-term effectiveness category, the time keyword in the text content and the context information of the time keyword may be extracted.
As an optional implementation manner, in step S106, extracting the time keyword in the text content may specifically include: and determining and extracting time keywords in the text content according to the text content and a pre-constructed keyword library.
In the embodiment of the application, the keyword library comprises at least two time keywords, and the time keywords contained in the text content can be matched according to the keyword library, and the matched time keywords are extracted.
The keyword library is constructed in the following way:
acquiring at least one seed time keyword; acquiring each candidate word; determining a target time keyword in each candidate word based on the similarity between each candidate word and the seed time keyword; and constructing a keyword library based on the various sub-time keywords and the target time keywords.
In the embodiment of the application, some seed time keywords can be obtained through manual sorting, for example, the seed time keywords are obtained through manual sorting in the near day, the previous period, the two previous days, the hot mapping and the current stopping.
The method and the device can acquire a large amount of text information, perform word segmentation on the text information to obtain a large amount of words, and arrange the large amount of words to obtain at least one candidate word. Typically, the magnitude of the candidate word is large, for example, the candidate word may be 50 ten thousand.
The word vector of each seed time keyword and the word vector of each candidate word can be determined by using the word library constructed as described above.
Further, the word vector of the seed time keyword and the word vector of each candidate word can be used for calculating the similarity value between the seed time keyword and the candidate word in an unsupervised manner, and the target time keyword in each candidate word is determined according to the similarity value. The similarity value may be a cosine similarity value, and the manner of determining the target time keyword in each candidate word according to the similarity value is not limited.
It should be noted that, for each seed time keyword, a similarity value between the seed time keyword and each candidate word may be calculated, or the similarity value between the seed time keyword and each candidate word may not be calculated.
As an optional implementation manner, when obtaining similarity values between various sub-time keywords and various candidate words, a candidate word corresponding to the similarity value larger than the similarity threshold may be selected as the target time keyword. The similarity threshold may be preset, and its value is not limited.
As another optional implementation manner, for each seed time keyword, based on the similarity between the seed time keyword and each candidate word, sorting the seed time keyword and the candidate words from high to low according to the similarity, taking a preset number of candidate words as the target time keyword, where the preset number is not limited, and may be, for example, 200.
When the similarity value is a cosine similarity value, the similarity values may be sorted in the order from small to large, that is, sorted in the order from high to low.
As shown in fig. 2, fig. 2 is a schematic diagram of determining a target time keyword according to an embodiment of the present application. When cosine similarity values between the seed time keyword 1 and the candidate words 1-N are calculated respectively, sorting the cosine similarity values from small to large, and taking the first N candidate words as target time keywords, wherein N and N are positive integers, and N is larger than N.
Further, a keyword library may be constructed based on the various sub-time keywords and the target time keyword. Certainly, in actual execution, after the number of the candidate words with the preset number is taken out as the number of the target time keywords, the target time keywords can be manually screened, so that the accuracy of the time keywords is ensured.
The target time keywords are screened out from the candidate words through the seed time keywords, a plurality of potential time keywords can be recalled, a word bank is expanded, after the target time keywords are screened out, secondary screening can be manually carried out on the screened out target time keywords, a small amount of manual correction is achieved, a sufficient amount of time keywords are obtained, and the accuracy of the time keywords can be ensured.
In the embodiment of the application, the context information of the time keyword comprises at least one of information A1-information A3.
Information a1, the target sentence in which the time keyword is located.
Information a2, at least one sentence preceding and adjacent to the target sentence.
Information a3, at least one sentence located after and adjacent to the target sentence.
Wherein at least one sentence in the information A2 and the information A3 is a continuous sentence.
For example, the context information of the time keyword "previous time" includes the "previous time" of the sentence where the "previous time" is located, and the short compilation finds a landmark building on the central street. "or, a sentence that includes the sentence of" previous period "and a sentence that is located after the sentence of" previous period "and is adjacent to the sentence of" previous period "such as" previous period ", the small compilation finds a landmark building on the central street. The landmark building is located on the right side of the church and is a roman style building. ".
Typically, at least one sentence is included in the textual content. Any sentence may or may not include the time keyword, and the time keyword may be located at any position such as the head, middle, or tail of the sentence, which is not limited herein.
In step S1042, the determining the aging of the content to be processed based on the time keyword in the text content may specifically include step S10421 and step S10422.
Step S10421, determining a second aging category of the content to be processed according to the time keyword and the context information of the time keyword, where the second aging category is the first category or the second category.
In the embodiment of the present application, the first category may be a short aging category, and the second category may be a long aging category. It may be determined whether the second aging category of the to-be-processed content is a long aging category or a short aging category according to the time keyword and the context information of the time keyword.
It should be noted that the text content in the content to be processed includes at least one time keyword, and when the number of the time keywords is 1, the second aging category of the content to be processed may be determined by using the time keyword and the context information of the time keyword; when the number of the time keywords is at least two, it is necessary to determine a second aging category of the content to be processed by using each time keyword and the context information of each time keyword. Specifically, the method comprises the following steps:
when the number of the time keywords is at least two, in step S10421, determining a second aging category of the content to be processed according to the time keywords and the context information of the time keywords, which may specifically include:
for each time keyword, determining an aging category corresponding to the time keyword according to the time keyword and the context information of the time keyword; when the aging categories corresponding to the time keywords are the second categories, determining that the second aging category of the content to be processed is the second category; and when at least one aging category exists in the aging categories corresponding to the time keywords and is the first category, determining that the second aging category of the content to be processed is the first category.
In the embodiment of the application, for each time keyword, the aging category of the time keyword can be determined according to the time keyword and the context information of the time keyword, and the aging category of the time keyword can be a first category or a second category.
When the aging categories corresponding to the time keywords are all the second categories, the fact that the aging categories corresponding to any time keywords are the long-aging categories can be obtained by combining the context information of the time keywords, and the second aging categories of the content to be processed can be finally determined to be the second categories, namely the long-aging categories, because the aging categories corresponding to the time keywords are the long-aging categories and the first aging categories of the content to be processed are determined to be the long-aging categories according to the text features of the text content.
For a time keyword, if the time keyword corresponds to the aging category of the first category, the description is combined with the context information of the time keyword, and the time keyword corresponds to the aging category of the short aging category. In the embodiment of the application, when at least one aging category exists in the aging categories corresponding to the time keywords and is a short aging category, it is determined that a second aging category of the content to be processed is a first category, namely the short aging category.
As an optional implementation manner, in step S10421, determining a second aging category of the to-be-processed content according to the time keyword and the context information of the time keyword, which may specifically include:
extracting the characteristics of the time keywords and the text characteristics of the context information of the time keywords; and determining a second aging category of the content to be processed according to the characteristics of the time keywords and the text characteristics of the context information of the time keywords.
In the embodiment of the application, when the time keyword is a word, a word vector of the time keyword can be determined based on the word stock, and the word vector is the characteristic of the time keyword; when the time keywords include at least two words, the time keywords can be segmented to obtain each word, word vectors of each word are determined based on the word bank, the word vectors of the time keywords are obtained by utilizing the respective word vectors of the at least two words, and the word vectors are the characteristics of the time keywords. As an optional implementation manner, the word vectors of at least two words may be accumulated and normalized to obtain the word vector of the time keyword.
Generally, the context information of the time keyword includes at least one sentence, and any sentence is composed of at least two words, so that the context information of the time keyword can be segmented to obtain each word, a word vector of each word is determined based on a word bank, and the text features of the context information of the time keyword are obtained by using the word vector of each word.
The features of the temporal keywords may be fused with the textual features of the contextual information of the temporal keywords. As an alternative implementation, during the fusion, the feature of the time keyword may be spliced before or after the text feature of the context information of the time keyword.
In the embodiment of the application, the second aging category of the content to be processed is determined according to the characteristics of the time keywords and the text characteristics of the context information of the time keywords, and the determination mode is not limited.
As an optional implementation manner, the features of the time keyword may be spliced before the text features of the context information of the time keyword, the spliced features are used as input information, the input information is input into a trained aging classification model (where the aging classification model is a second aging classification model and is two different models from the first aging classification model), the aging classification model outputs an aging class, and the aging class output by the aging classification model is used as a second aging class of the content to be processed.
The aging classification model can be a two-classification model, and the aging classification output by the two-classification model can be a long aging classification or a short aging classification. Wherein the two-classification model can be a Classifier (English: Classifier).
The training sample set of the aging classification model can be pre-constructed, and details can be found in the relevant description of the foregoing embodiments, which are not described herein again.
Step S10422, determining the aging of the content to be processed based on the aging corresponding to the second aging category.
In the above description, the aging corresponding to the short aging category and the aging corresponding to the long aging category are set in advance, the second aging category is the first category or the second category, the first category is the short aging category, and the second category is the long aging category. When the second aging category of the content to be processed is the long aging category, determining the aging of the content to be processed as the aging corresponding to the long aging category; when the second aging category of the content to be processed is the short aging category, the aging of the content to be processed can be determined as the aging corresponding to the short aging category.
In the embodiment of the application, if the first time effect category is a long time effect category, it is described that the time effect category of the content to be processed is a long time effect category, the time effect of the content to be processed is longer, and in order to avoid that the content to be processed which is short time effect itself is determined as long time effect, the time effect of the content to be processed can be further determined by using the time keyword. If the aging of the content to be processed is determined to be long aging according to the time key words, the aging category of the content to be processed is actually the long aging category, and the aging of the content to be processed is not corrected; if the aging of the content to be processed is determined to be short aging according to the time key words, it is indicated that the aging category of the content to be processed should be a short aging category, and the aging of the content to be processed needs to be corrected at this time. And the aging of the content to be processed is further corrected, so that the aging of the content to be processed is accurately determined.
In another possible implementation manner of the embodiment of the present application, the method may further include: and acquiring the content category of the content to be processed.
The content category is divided into different categories, and the data are not limited. For example, content categories may include sports, movies, science and technology, finance, entertainment, society, and so forth. The content category of the content to be processed may be at least one, for example, the content category of the content to be processed may be movies and society.
Step S104, determining the aging of the content to be processed according to the first aging category, which may specifically include: and determining the aging of the content to be processed according to the aging corresponding to the first aging category and the aging corresponding to the content category.
Specifically, in step S1041, determining an aging of the to-be-processed content based on the aging corresponding to the first class may specifically include: and determining the aging of the content to be processed based on the aging corresponding to the first category and the aging corresponding to the content category.
Step S1042, determining the aging of the content to be processed based on the time keyword in the text content, which may specifically include: and determining the time efficiency of the content to be processed based on the time keywords in the text content and the time efficiency corresponding to the content category.
In the embodiment of the application, the corresponding relation between the content category and the time efficiency can be preset. For example, the correspondence between the content type and the age can be set as follows:
content categories Aging (sky)
Sports 3
Film 7
Science and technology 3
Finance and economics 2
Entertainment system 3
Society, its own and other related applications 2
…… ……
When the first time efficiency category is a short time efficiency category, it may be determined that the time efficiency of the content to be processed is the time efficiency corresponding to the short time efficiency category or the time efficiency corresponding to the content category.
For example, if the first aging category is a short aging category, the aging corresponding to the short aging category is 3 days, the content category is a movie and a society, the aging corresponding to the movie is 7 days, and the aging corresponding to the society is 2 days, the aging of the content to be processed is 2 days.
When the first aging category is a long aging category and the second aging category is also a long aging category, it may be determined that the aging of the content to be processed is the aging corresponding to the long aging category or the aging corresponding to the content category, and of course, the aging of the content to be processed may also be determined as the minimum aging of the aging corresponding to the long aging category and the aging corresponding to the content category.
When the first aging category is a long aging category and the second aging category is a short aging category, it may be determined that the aging of the content to be processed is the aging corresponding to the short aging category or the aging corresponding to the content category, and of course, the aging of the content to be processed may also be determined as the minimum aging of the aging corresponding to the short aging category and the aging corresponding to the content category.
The above examples describe the data processing method based on aging in detail from the perspective of method steps, and in order to better understand and describe the solutions provided in the examples of the present application, the following describes alternative embodiments of the present application with reference to specific application scenarios. In the application scenario, the scheme provided by the embodiment of the present application may be applied to an information recommendation application, and in the scenario (the to-be-processed content is an information article already recommended to a user, based on the aging-based data processing scheme of the present application, the information article recommended to the user by the application may be an article without being outdated, and an information article with a long aging may be effectively recommended.
Fig. 3 is a schematic diagram illustrating a structure of an information recommendation system corresponding to an aging-based data processing method applied in the present application, as shown in fig. 3, a server 320 including an application program, and user terminal devices (fig. 3 shows a user terminal device 310 and a user terminal device 311) communicating with the server 320. Wherein, the user terminal 310 and the user terminal 311 are installed with applications, and the server 320 of the applications can recommend information articles in the applications. Fig. 4 is a schematic diagram of determining an aging of an information article according to an embodiment of the present application, and fig. 5 is a schematic diagram of determining an aging of an information article according to an embodiment of the present application, which will be described below with reference to fig. 3, fig. 4, and fig. 5.
In the embodiment of the present application, the server 320 may obtain an information article recommended online by an application, and extract text content from the information article, where the extracted text content includes a title and a body.
As shown in fig. 4, the server 320 may perform word segmentation on the title, determine a word vector of each word after word segmentation, and then accumulate and normalize the word vectors of each word to obtain text features of the title; similarly, the server 320 may perform word segmentation on the text, determine a word vector of each word after word segmentation, and then accumulate and normalize the word vectors of each word to obtain text features of the text. Wherein, the Word vector of any Word can be determined by using a trained Word2vec model.
Further, the server 320 may splice the text feature of the title to the text feature of the body to obtain the text feature of the text content, then input the text feature of the text content to the trained XGBoost model, and the XGBoost model outputs the first time category of the information article.
And when the first aging type of the information article is the short aging type, determining the aging of the information article as the aging corresponding to the short aging type, namely 3 days.
When the first time effect type of the information article is a long time effect type, the time effect type of the information article can be further judged. As shown in fig. 5, the server 320 may determine whether the text content includes a time keyword, obtain the time keyword and context information of the time keyword if the text content includes the time keyword, and output an age category of the information article by using the time keyword, the context information of the time keyword and an age classification model.
The aging classification model can determine the text characteristics of the time keywords and the text characteristics of the context information of the time keywords, the text characteristics of the context information of the time keywords are spliced after the text characteristics of the time keywords, the spliced text characteristics are input into the aging classifier, and the aging classifier outputs a second aging category of the information article.
When the second aging category is the long aging category, the server 320 determines that the aging of the information article is the aging corresponding to the long aging category, that is, 30 days; when the second aging category is a short aging category, the server 320 determines that the aging of the information article is the aging corresponding to the short aging category, i.e. 3 days.
Further, the server 320 can determine the recommended time of the information article on the application line, and delete the information article when the recommended time just reaches the time limit corresponding to the information article, that is, immediately put the information article on or off the application line, thereby optimizing the user experience.
The above method steps specifically illustrate the data processing method based on aging, and the following introduces the data processing apparatus based on aging from the perspective of virtual modules, specifically as follows:
the embodiment of the present application provides an aging-based data processing apparatus, and as shown in fig. 6, the aging-based data processing apparatus 60 may include: a content acquisition module 601, an aging category determination module 602, an aging determination module 603, and a content processing module 604, wherein,
the content obtaining module 601 is configured to obtain content to be processed, where the content to be processed includes text content.
The time effect category determining module 602 is configured to determine a text feature of the text content, and determine a first time effect category of the content to be processed according to the text feature of the text content.
The aging determining module 603 is configured to determine, when the first aging category is a first category, an aging of the content to be processed based on an aging corresponding to the first category, and determine, when the first aging category is a second category, an aging of the content to be processed based on a time keyword in the text content, where the aging corresponding to the second category is greater than the aging corresponding to the first category.
And a content processing module 604, configured to perform processing according to the aging of the content to be processed.
In another possible implementation manner of the embodiment of the present application, the aging-based data processing apparatus 60 further includes a keyword extraction module, wherein,
and the keyword extraction module is used for extracting the time keywords in the text content and the context information of the time keywords.
The aging determining module 603 is specifically configured to, when determining the aging of the content to be processed based on the time keyword in the text content:
determining a second aging category of the content to be processed according to the time keyword and the context information of the time keyword, wherein the second aging category is the first category or the second category;
and determining the aging of the content to be processed based on the aging corresponding to the second aging category.
In another possible implementation manner of the embodiment of the present application, when determining the second aging category of the content to be processed according to the time keyword and the context information of the time keyword, the aging determination module 603 is specifically configured to:
extracting the characteristics of the time keywords and the text characteristics of the context information of the time keywords;
and determining a second aging category of the content to be processed according to the characteristics of the time keywords and the text characteristics of the context information of the time keywords.
In another possible implementation manner of the embodiment of the present application, when the number of the time keywords is at least two, the aging determination module 603 is specifically configured to, when determining the second aging category of the content to be processed according to the time keywords and the context information of the time keywords:
for each time keyword, determining an aging category corresponding to the time keyword according to the time keyword and the context information of the time keyword;
when the aging categories corresponding to the time keywords are the second categories, determining that the second aging category of the content to be processed is the second category;
and when at least one aging category exists in the aging categories corresponding to the time keywords and is the first category, determining that the second aging category of the content to be processed is the first category.
In another possible implementation manner of the embodiment of the present application, the context information of the time keyword includes at least one of the following:
the target sentence where the time keyword is located is at least one sentence which is located in front of the target sentence and is adjacent to the target sentence, and the at least one sentence which is located behind the target sentence and is adjacent to the target sentence.
In another possible implementation manner of the embodiment of the present application, the time efficiency determining module 603 is specifically configured to, when extracting a time keyword in a text content:
determining and extracting time keywords in the text content according to the text content and a pre-constructed keyword library;
the keyword library is constructed in the following way:
acquiring at least one seed time keyword;
acquiring each candidate word;
determining a target time keyword in each candidate word based on the similarity between each candidate word and the seed time keyword;
and constructing a keyword library based on the various sub-time keywords and the target time keywords.
In another possible implementation manner of the embodiment of the present application, the content obtaining module 601 is further configured to:
acquiring the content category of the content to be processed;
the aging determining module 603 is specifically configured to, when determining the aging of the content to be processed based on the aging corresponding to the first category:
and determining the aging of the content to be processed based on the aging corresponding to the first category and the aging corresponding to the content category.
The aging determining module 603 is specifically configured to, when determining the aging of the content to be processed based on the time keyword in the text content:
and determining the time efficiency of the content to be processed based on the time keywords in the text content and the time efficiency corresponding to the content category.
In another possible implementation manner of the embodiment of the present application, the text content includes a title and a body, and the aging category determining module 602 is specifically configured to, when determining the text feature of the text content:
extracting text features of the title and text features of the text;
and fusing the text features of the title and the text features of the text to obtain the text features of the text content.
In another possible implementation manner of the embodiment of the present application, the aging category determining module 602 is specifically configured to, when fusing the text feature of the title and the text feature of the body to obtain the text feature of the text content:
and splicing the text features of the title and the text features of the text to obtain the text features of the text content.
In another possible implementation manner of the embodiment of the application, the content to be processed is recommended content;
the content processing module 604 is specifically configured to:
determining a recommended time at which to recommend content;
and when the difference value between the recommended time and the time efficiency of the recommended content is not less than the set value, deleting the recommended content.
The aging-based data processing apparatus 60 of this embodiment can execute the aging-based data processing method provided in this embodiment of the present application, which is similar to the above-mentioned embodiment, and therefore, the implementation principle is not described herein again.
Compared with the prior art, the data processing device based on the aging determines the aging category of the content to be processed according to the text characteristics of the text content in the content to be processed, and if the aging category of the content to be processed is the first category, determining the aging of the content to be processed based on the aging corresponding to the first category, if the aging category of the content to be processed is the second category, determining the aging of the content to be processed based on the time key words in the text content, namely, the aging of the content to be processed is determined according to the text characteristics of the whole text content in the content to be processed, the accuracy of the aging of the content to be processed is greatly improved, and further, and corresponding processing is carried out according to the time efficiency of the content to be processed, so that the information recommended to the user by the application program is valid information without outdated, and the user experience is improved.
The above describes the aging-based data processing apparatus of the present application from the perspective of a virtual module, and the following describes the electronic device of the present application from the perspective of a physical device.
An embodiment of the present application provides an electronic device, as shown in fig. 7, an electronic device 4000 shown in fig. 7 includes: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
Processor 4001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. Bus 4002 may be a PCI bus, EISA bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
Memory 4003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, an optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 4003 is used for storing computer programs for executing the present scheme, and is controlled by the processor 4001 for execution. Processor 4001 is configured to execute a computer program stored in memory 4003 to implement what is shown in any of the foregoing method embodiments.
An embodiment of the present application provides an electronic device, where the electronic device includes: a memory and a processor, wherein the memory has stored therein a computer program; the processor, when running the computer program, performs the aging-based data processing method shown in the method embodiments.
The electronic device of the present application is described above from the perspective of a physical device, and the computer-readable storage medium of the present application is described below from the perspective of a storage medium.
The embodiment of the application provides a computer-readable storage medium, wherein a computer program is stored in the storage medium, and when being executed by a processor, the computer program realizes the data processing method based on the aging shown in the method embodiment.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations to which the above-described method embodiments relate.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (14)

1. A data processing method based on aging is characterized by comprising the following steps:
acquiring content to be processed, wherein the content to be processed comprises text content;
determining text features of the text content;
determining a first time class of the content to be processed according to the text characteristics of the text content;
if the first time effect type is a first type, determining the time effect of the content to be processed based on the time effect corresponding to the first type;
if the first time effect category is a second category, determining the time effect of the content to be processed based on the time key words in the text content, wherein the time effect corresponding to the second category is larger than the time effect corresponding to the first category;
and processing according to the aging of the content to be processed.
2. The method of claim 1, wherein determining the age of the content to be processed based on the time keyword in the text content further comprises:
extracting time keywords in the text content and context information of the time keywords;
the determining the aging of the content to be processed based on the time keyword in the text content comprises:
determining a second aging category of the content to be processed according to the time keyword and the context information of the time keyword, wherein the second aging category is the first category or the second category;
and determining the aging of the content to be processed based on the aging corresponding to the second aging category.
3. The method of claim 2, wherein determining the second aging category of the content to be processed according to the time keyword and the context information of the time keyword comprises:
extracting the characteristics of the time keywords and the text characteristics of the context information of the time keywords;
and determining a second aging category of the content to be processed according to the characteristics of the time keywords and the text characteristics of the context information of the time keywords.
4. The method of claim 2, wherein if the time keywords are at least two, the determining the second aging category of the to-be-processed content according to the time keywords and the context information of the time keywords comprises:
for each time keyword, determining an aging category corresponding to the time keyword according to the time keyword and the context information of the time keyword;
if the aging category corresponding to each time keyword is the second category, determining that the second aging category of the content to be processed is the second category;
and if at least one aging category exists in the aging categories corresponding to the time keywords and is the first category, determining that the second aging category of the content to be processed is the first category.
5. The method according to any of claims 2-4, wherein the context information of the temporal keyword comprises at least one of:
a target sentence where the time keyword is located;
at least one sentence that is located before the target sentence and is adjacent to the target sentence;
at least one sentence that follows the target sentence and is adjacent to the target sentence.
6. The method of claim 2, wherein the extracting the temporal keyword from the text content comprises:
determining and extracting time keywords in the text content according to the text content and a pre-constructed keyword library;
the keyword library is constructed in the following way:
acquiring at least one seed time keyword;
acquiring each candidate word;
determining a target time keyword in each candidate word based on the similarity between each candidate word and the seed time keyword;
and constructing the keyword library based on various sub-time keywords and the target time keywords.
7. The method of claim 1, further comprising:
acquiring the content category of the content to be processed;
the determining the aging of the content to be processed based on the aging corresponding to the first category includes:
determining the aging of the content to be processed based on the aging corresponding to the first category and the aging corresponding to the content category;
the determining the aging of the content to be processed based on the time keyword in the text content comprises:
and determining the time efficiency of the content to be processed based on the time keywords in the text content and the time efficiency corresponding to the content category.
8. The method of claim 1, wherein the text content includes a title and a body, and wherein determining the text characteristic of the text content includes:
extracting text features of the title and text features of the body;
and fusing the text features of the title and the text features of the text body to obtain the text features of the text content.
9. The method of claim 8, wherein fusing the text features of the title and the text features of the body to obtain the text features of the text content comprises:
and splicing the text features of the title and the text features of the text to obtain the text features of the text content.
10. The method according to claim 1, wherein the content to be processed is recommended content;
the processing according to the aging of the content to be processed comprises the following steps:
determining a recommended time of the recommended content;
and if the difference value between the recommended time and the time efficiency of the recommended content is not less than a set value, deleting the recommended content.
11. An aging-based data processing apparatus, comprising:
the content acquisition module is used for acquiring the content to be processed, wherein the content to be processed comprises text content;
the aging category determining module is used for determining the text characteristics of the text content and determining a first aging category of the content to be processed according to the text characteristics of the text content;
the aging determining module is used for determining the aging of the content to be processed based on the aging corresponding to the first category when the first aging category is the first category, and determining the aging of the content to be processed based on the time keyword in the text content when the first aging category is the second category, wherein the aging corresponding to the second category is larger than the aging corresponding to the first category;
and the content processing module is used for processing according to the aging of the content to be processed.
12. The apparatus of claim 11, further comprising a keyword extraction module;
the keyword extraction module is used for extracting time keywords in the text content and context information of the time keywords;
the aging determining module, when determining the aging of the content to be processed based on the time keyword in the text content, is specifically configured to:
determining a second aging category of the content to be processed according to the time keyword and the context information of the time keyword, wherein the second aging category is the first category or the second category;
and determining the aging corresponding to the second aging category as the aging of the content to be processed.
13. An electronic device, comprising a memory and a processor, wherein the memory has stored therein a computer program; the processor, when executing the computer program, performs the method of any of claims 1 to 10.
14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 10.
CN202011217879.7A 2020-11-04 2020-11-04 Aging-based data processing method and device, electronic equipment and storage medium Active CN113407714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011217879.7A CN113407714B (en) 2020-11-04 2020-11-04 Aging-based data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011217879.7A CN113407714B (en) 2020-11-04 2020-11-04 Aging-based data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113407714A true CN113407714A (en) 2021-09-17
CN113407714B CN113407714B (en) 2024-03-12

Family

ID=77677419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011217879.7A Active CN113407714B (en) 2020-11-04 2020-11-04 Aging-based data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113407714B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019047849A1 (en) * 2017-09-05 2019-03-14 腾讯科技(深圳)有限公司 News processing method, apparatus, storage medium and computer device
CN110737783A (en) * 2019-10-08 2020-01-31 腾讯科技(深圳)有限公司 method, device and computing equipment for recommending multimedia content
CN110929018A (en) * 2019-12-04 2020-03-27 Oppo(重庆)智能科技有限公司 Text processing method and device, storage medium and electronic equipment
CN110990705A (en) * 2019-12-06 2020-04-10 腾讯科技(深圳)有限公司 News processing method, device, equipment and medium
CN111177462A (en) * 2020-01-03 2020-05-19 百度在线网络技术(北京)有限公司 Method and device for determining video distribution timeliness

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019047849A1 (en) * 2017-09-05 2019-03-14 腾讯科技(深圳)有限公司 News processing method, apparatus, storage medium and computer device
CN110737783A (en) * 2019-10-08 2020-01-31 腾讯科技(深圳)有限公司 method, device and computing equipment for recommending multimedia content
CN110929018A (en) * 2019-12-04 2020-03-27 Oppo(重庆)智能科技有限公司 Text processing method and device, storage medium and electronic equipment
CN110990705A (en) * 2019-12-06 2020-04-10 腾讯科技(深圳)有限公司 News processing method, device, equipment and medium
CN111177462A (en) * 2020-01-03 2020-05-19 百度在线网络技术(北京)有限公司 Method and device for determining video distribution timeliness

Also Published As

Publication number Publication date
CN113407714B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
JP7201730B2 (en) Intention recommendation method, device, equipment and storage medium
CN112203122B (en) Similar video processing method and device based on artificial intelligence and electronic equipment
CN110489558B (en) Article aggregation method and device, medium and computing equipment
CN105210064B (en) Classifying resources using deep networks
US11613008B2 (en) Automating a process using robotic process automation code
US20200327190A1 (en) Personalized book-to-movie adaptation recommendation
US20130157234A1 (en) Storyline visualization
US11151323B2 (en) Embedding natural language context in structured documents using document anatomy
CN110362663B (en) Adaptive multi-perceptual similarity detection and analysis
US10083031B2 (en) Cognitive feature analytics
JP7539201B2 (en) Rare Topic Detection Using Hierarchical Clustering
US11328019B2 (en) Providing causality augmented information responses in a computing environment
CN113761868A (en) Text processing method and device, electronic equipment and readable storage medium
US20220335270A1 (en) Knowledge graph compression
US11734602B2 (en) Methods and systems for automated feature generation utilizing formula semantification
US10769386B2 (en) Terminology proposal engine for determining target language equivalents
US11804245B2 (en) Video data size reduction
US20230161948A1 (en) Iteratively updating a document structure to resolve disconnected text in element blocks
US11475211B1 (en) Elucidated natural language artifact recombination with contextual awareness
US11615245B2 (en) Article topic alignment
CN113407714B (en) Aging-based data processing method and device, electronic equipment and storage medium
CN111723177B (en) Modeling method and device of information extraction model and electronic equipment
US20200302005A1 (en) Comment-based article augmentation
US20200272648A1 (en) Text Extraction and Processing
Hirchoua et al. Topic hierarchies for knowledge capitalization using hierarchical Dirichlet processes in big data context

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40053143

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant