CN115357790A - Method, device, medium and electronic equipment for identifying key segments in content - Google Patents

Method, device, medium and electronic equipment for identifying key segments in content Download PDF

Info

Publication number
CN115357790A
CN115357790A CN202210982950.3A CN202210982950A CN115357790A CN 115357790 A CN115357790 A CN 115357790A CN 202210982950 A CN202210982950 A CN 202210982950A CN 115357790 A CN115357790 A CN 115357790A
Authority
CN
China
Prior art keywords
segment
content
exposure
article
segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210982950.3A
Other languages
Chinese (zh)
Inventor
康小明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210982950.3A priority Critical patent/CN115357790A/en
Publication of CN115357790A publication Critical patent/CN115357790A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a method, a device, a computer readable medium and an electronic device for identifying key segments in content, wherein the method comprises the following steps: dividing the content into a plurality of segments; acquiring behavior data generated when a plurality of objects access each segment in the content; determining the criticality of each segment in the content according to behavior data generated when the plurality of objects access each segment in the content, wherein the criticality is used for measuring the criticality of the segment in the content; at least one key segment is identified in the plurality of segments of the content according to the criticality of each segment in the content. According to the method and the device, the key fragments which are higher in value and more worthy of being accessed in the content can be automatically extracted, the information acquisition efficiency of the user can be improved by providing the key fragments for the user, and the time of the user is saved. The embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

Description

Method, device, medium and electronic equipment for identifying key segments in content
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a method and an apparatus for identifying key segments in content, a computer-readable medium, and an electronic device.
Background
With the development of the internet, especially the mobile internet, the information age has come, and various contents such as articles, videos, pictures and the like are coming to people just like the tide.
Although the internet can provide a convenient way for users to acquire contents, many contents have a large amount of information which is hardly worth browsing, the information occupies a large amount of reading time of the users, valuable information is not provided for the users, the information acquisition efficiency of the users is reduced, and the user experience is poor.
Disclosure of Invention
Embodiments of the present application provide a method, an apparatus, a computer-readable medium, and an electronic device for identifying a key segment in a content, which can automatically extract the key segment in the content at least to a certain extent, so that the information acquisition efficiency of a user can be improved by providing the key segment for the user, and the time of the user is saved.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, there is provided a method for identifying key segments in content, the method including: dividing the content into a plurality of segments; acquiring behavior data generated when a plurality of objects access each segment in the content; determining the criticality of each segment in the content according to behavior data generated when the plurality of objects access each segment in the content, wherein the criticality is used for measuring the criticality of the segment in the content; at least one key segment is identified in the plurality of segments of the content according to the criticality of each segment in the content.
According to an aspect of an embodiment of the present application, there is provided an apparatus for identifying key segments in content, the apparatus including: a dividing unit for dividing the content into a plurality of segments; a behavior data acquisition unit for acquiring behavior data generated when a plurality of objects access respective segments in the content; the determining unit is used for determining the criticality of each segment in the content according to behavior data generated when the plurality of objects access each segment in the content, and the criticality is used for measuring the criticality of the segment in the content; the identification unit is used for identifying at least one key segment in the plurality of segments of the content according to the criticality of each segment in the content.
In some embodiments of the present application, based on the foregoing scheme, the content is an article, and the behavior data acquiring unit is configured to: the method comprises the steps of obtaining the exposure duration of each exposure of each segment in an article when the article is read by a plurality of objects, so as to obtain the exposure duration corresponding to each multiple exposure of each segment, wherein the exposure duration is the duration from the segment appearing on a screen to the segment disappearing from the screen.
In some embodiments of the present application, based on the foregoing scheme, the determining unit is configured to: according to the exposure duration of each exposure of each segment in an article when the article is read by a plurality of objects, normalizing the exposure duration corresponding to the multiple exposures of each segment respectively to obtain the normalized exposure duration of each segment; and determining the criticality of each segment in the article according to the normalized exposure time of each segment in the article.
In some embodiments of the present application, based on the foregoing scheme, the behavior data acquiring unit is further configured to: acquiring the exposure times of each segment in an article when the articles are read by a plurality of objects; the determination unit is further configured to: according to the exposure times of all the segments in the article, carrying out normalization processing on the exposure times of all the segments in the article to obtain the normalized exposure times of all the segments; and determining the criticality of each segment in the content according to the normalized exposure time and the normalized exposure times of each segment in the article.
In some embodiments of the present application, based on the foregoing scheme, the behavior data further includes active operation record data, and the determining unit is configured to: for each segment in the article, weighting exposure duration of at least one exposure of the segment according to active operation record data generated by the segment during exposure, and replacing the exposure duration with corresponding weighted exposure duration; and normalizing the multiple exposure time lengths corresponding to each segment to obtain the normalized exposure time length of each segment, wherein the multiple exposure time lengths comprise the weighted exposure time length.
In some embodiments of the present application, based on the foregoing scheme, the behavior data acquiring unit is further configured to: acquiring each exposure record of each segment in an article when the plurality of objects read the article, and calculating the primary exposure times according to each exposure record; for each segment in the article, weighting the original exposure times of at least one exposure record of the segment according to active operation record data generated by the segment during exposure to obtain weighted original exposure times; and aiming at each segment in the article, obtaining the exposure times of the segment according to the original exposure times of at least one exposure record of the segment after weighting processing and the original exposure times of other exposure records.
In some embodiments of the present application, based on the foregoing solution, the identification unit is configured to: determining a first candidate key segment in a plurality of segments of the content according to the criticality of each segment in the content; extracting behavior characteristics of each segment from behavior data generated when the plurality of objects access each segment in the content; respectively inputting the behavior characteristics of each segment into a pre-established artificial intelligence model to obtain a prediction probability value which is output by the artificial intelligence model and corresponds to each segment; determining a second candidate key segment in the plurality of segments of the content according to the prediction probability value corresponding to each segment in the content; and determining a key segment according to the first candidate key segment and the second candidate key segment.
In some embodiments of the present application, based on the foregoing scheme, after identifying at least one key segment in the plurality of segments of the content according to the criticality of each segment in the content, the identifying unit is further configured to: when the target object accesses the content, the content is provided to the target object, and the key segments in the content are highlighted.
In some embodiments of the present application, based on the foregoing scheme, after identifying at least one key segment in the plurality of segments of the content according to the criticality of each segment in the content, the identifying unit is further configured to: when a target object accesses the content, providing an operation control for the target object in a display interface of the content; and responding to the triggering operation of the target object on the operation control, and only displaying the key segments in the content in the display interface.
According to an aspect of embodiments of the present application, there is provided a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method for identifying key snippets in content as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of identifying key snippets in content as described in the embodiments above.
According to an aspect of embodiments of the present application, there is provided a computer program product, which includes computer instructions stored in a computer-readable storage medium, and a processor of a computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions to cause the computer device to perform the method for identifying key segments in content as described in the above embodiments.
In the technical solutions provided in some embodiments of the present application, after a content is divided into a plurality of segments, by acquiring behavior data generated when a plurality of objects access each segment in the content, and determining a criticality of each segment according to the behavior data generated when the plurality of objects access each segment in the content, a criticality of the segment in the content is measured, and finally, a key segment in the content is determined according to the criticality of each segment. Therefore, the method can automatically extract the key fragments which have higher value and are more worthy of access from the content by means of big data, and further can improve the information acquisition efficiency of the user and save the time of the user by providing the key fragments for the user.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a diagram illustrating an application scenario of a text summarization technique in a search engine in the related art;
FIG. 2 is a diagram illustrating an application scenario of a text summarization technique in news event extraction in the related art;
FIG. 3 shows a schematic page diagram of an article with low text-to-title relevance according to an embodiment of the application;
FIG. 4 illustrates a page view of a lengthy article according to one embodiment of the present application;
FIG. 5 shows a page schematic of a headliner party article according to one embodiment of the present application;
FIG. 6 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the present application may be applied;
FIG. 7 shows a flow diagram of a method of identifying key snippets in content according to one embodiment of the present application;
FIG. 8 shows a flowchart of details of step 720 in FIG. 7 according to one embodiment of the present application;
FIG. 9 shows a flowchart of details of step 730 in FIG. 8 according to one embodiment of the present application;
FIG. 10 illustrates a flow diagram for obtaining a number of exposures of multiple objects to each segment in an article while reading the article according to one embodiment of the present application;
FIG. 11 shows a flowchart of steps subsequent to step 740 in FIG. 7, according to one embodiment of the present application;
FIG. 12 shows a flowchart of steps subsequent to step 740 in FIG. 7, according to another embodiment of the present application;
FIG. 13 shows a schematic overall flow diagram of a scheme according to an embodiment of the present application;
FIG. 14 shows a block diagram of an apparatus for identifying key snippets in content according to one embodiment of the present application;
FIG. 15 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
With the advent of the information age, users seemed to travel in the ocean of information when accessing content in the internet, and a large amount of worthless information became irrelevant to the users. Valuable information in many contents is little, and a user needs to spend a lot of energy to screen, which wastes a lot of time for the user.
Technologies capable of automatically performing processing of information in the related art are mainly a text summarization technology and an automatic news composition technology.
The text summarization technology can automatically generate or automatically extract the central thought of an article from the article, and the central thought of the article is used as a summary, so that the long text of the article is converted into the shorter text of the summary. The text summarization technology can be applied to scenes such as a search engine and news event extraction.
The search engine can search out the websites matched with the keywords according to the input keywords, and the search engine determines the websites matched with the keywords by means of a text summarization technology. Specifically, the search engine may extract an abstract from the article on each website, and then determine that the keyword matches the website corresponding to the abstract by determining that the keyword is located in the abstract. Fig. 1 is a schematic diagram illustrating an application scenario of a text summarization technology in a search engine in the related art. Referring to fig. 1, when a user inputs a keyword of "text summarization technology" in a search engine, the search engine returns corresponding search results, and the summary of each website in the search results is matched with the "text summarization technology". For example, the summary of the first search result is the content in the rectangular box, and the summary contains "text summary" and "technique", which is apparently matched with "text summary technique".
In the application scene of news event extraction, short news events can be extracted by means of text summarization technology. Fig. 2 is a schematic diagram illustrating an application scenario of the text summarization technology in news event extraction in the related art. Referring to FIG. 2, the text in the hot list provided by many websites extracts news events by text summarization techniques.
In addition, automatic news writing technology exists in the related art, and some news events can be generated simply and quickly by using the automatic news writing technology and are often used in emergencies or sports news reports.
However, the main purpose of the text summarization technology is to extract content segments most relevant to the search intention, not the key segments of the essence of the article; although the automatic news writing technology has less redundant information, the use scene is relatively limited, and the written news is relatively short.
Therefore, the related art cannot identify key fragments in the article; in addition, nowadays, self-media articles are more and more, and the related technology is obviously inexplicable to a large number of self-media articles, and cannot realize effective information processing on the self-media articles.
To this end, the present application first provides a method for identifying key snippets in content. The method for identifying the key segments in the content, provided by the embodiment of the application, can overcome the defects, can identify the essential key segments in numerous and complicated contents, and further enables a user to more efficiently obtain the key segments from the contents such as articles, so that the user can more simply and directly obtain the key information, the information obtaining efficiency of the user can be improved, the time of the user is saved, and the user experience is improved. The method for identifying key segments in content provided by the embodiment of the application is particularly used for articles with the following problems: text and headline associated low, excessively lengthy articles, headline party articles.
A text having a low association between the text and the title means that the text contains graphic and text information having a low association with the title, and the user is easily confused with such non-nutritional information. FIG. 3 shows a schematic page diagram of an article with low text-to-title relevance according to an embodiment of the application. Referring to fig. 3, the article is entitled "study display: the best fitness method XX "in the text of the article contains a number of less relevant items to the" best fitness method ", which are marked by rounded rectangular boxes.
Too lengthy articles are articles with too many words. FIG. 4 illustrates a page view of a lengthy article according to one embodiment of the present application. It can be seen that the title of the article is related to the bank, the content of the body occupies a long space, and the content related to the bank is distributed in various positions of the body.
A headline party article refers to an article whose headline is exaggerated but whose body may be completely unrelated or less relevant to the headline. FIG. 5 shows a page schematic of a headline party article according to one embodiment of the present application. As can be seen from fig. 5, the title party article may attract the user to click but the article value is low, only a small part of the text of the article contains information worth reading, and the reading time of the user is wasted.
By adopting the method for identifying the key segments in the content provided by the embodiment of the application to process the articles with the problems, the key segments in the articles can be automatically identified, and the key segments are directly provided for the user, so that the user can efficiently, directly and quickly acquire the most valuable information in the articles.
Fig. 6 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied. Referring to fig. 6, the system architecture 600 may include: the content uploading terminal 610, the server 620 and a plurality of user terminals, wherein the plurality of user terminals specifically include a first user terminal 630, a second user terminal 640 and a third user terminal 650, communication connections are established between each user terminal and the server 620 and between the content uploading terminal 610 and the server 620, the content uploading terminal 610 has a content uploading client, each user terminal runs an access client, and the server 620 is deployed with a server capable of providing service for the access client on the user terminal and the content uploading client on the content uploading terminal 610. Taking the server 620 as an execution terminal in the embodiment of the present application as an example, when the method for identifying key segments in content provided by the present application is applied to the system architecture shown in fig. 6, one process may be as follows: firstly, after a user of a content uploading terminal 610 creates content, the content is uploaded to a server 620 through a content uploading client; next, the server side of the server 620 identifies the key fragments in the content by performing the following process: dividing the content into a plurality of segments; sending the content to at least two user terminals according to the content access requests of the at least two user terminals; acquiring behavior data generated when each user terminal accesses each segment in the content; determining the criticality of each segment in the content according to the behavior data, and identifying at least one key segment in a plurality of segments of the content according to the criticality of each segment; finally, when the target user terminal requests to access the content, the server 620 may highlight the key segments in the content or provide a button to the user on the display interface of the content while returning the content to the target user terminal, so that the user of the target user terminal may highlight the key segments in the content or only display the key segments in the content when triggering the button, and thus the user of the target user terminal may directly access key segments with higher value and more worth accessing, thereby improving the information acquisition efficiency.
In some embodiments of the present application, the content is any one of an article, video, audio, picture, including pictures and/or text.
In some embodiments of the present application, the content uploaded by the content uploading terminal 610 to the server 620 is a self-media article.
It should be understood that the number of content upload terminals, servers, and user terminals in fig. 6 is merely illustrative. There may be any number of content upload terminals, servers, and user terminals, as desired for implementation. For example, the number of the content uploading terminals may be multiple, the server may be a server cluster formed by multiple servers, and the number of the user terminals may be less than three or more than three.
It should be noted that fig. 6 shows only one embodiment of the present application. Although in the solution of the embodiment of fig. 6, the content uploading terminal is a laptop, the execution terminal is a server, the user terminal is a smartphone, and the terminal types of the content uploading terminal and the user terminal are different, in other embodiments of the present application, the content uploading terminal, the execution terminal, and the user terminal may be various terminal devices such as a desktop, a laptop, an iPAD, a smartphone, a vehicle-mounted terminal, and the terminal types of the content uploading terminal and the user terminal may also be the same; although in the scheme of the embodiment of fig. 6, the content is a self-media article, it is easy to understand that in other embodiments of the present application, the content may also be other types of articles such as an article of authoritative media, and the content may even be a book; although the scheme of the embodiment of fig. 6 is to identify the key segments in the content by acquiring the behavior data generated when each user terminal accesses each segment in the content, in other embodiments of the present application, the content may also be sent to a terminal of a content approver, and the key segments in the content may be identified by acquiring the behavior data generated when each user terminal accesses each segment in the content. The embodiments of the present application do not limit this, and the protection scope of the present application should not be limited thereby.
It is easy to understand that the method for identifying key segments in content provided by the embodiments of the present application is generally performed by a server, and accordingly, the device for identifying key segments in content is generally disposed in the server. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the scheme for identifying the key segments in the content provided by the embodiments of the present application.
Therefore, the scheme of the embodiment of the application can be applied to a terminal or a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and a big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 7 shows a flowchart of a method for identifying key segments in content according to an embodiment of the present application, which may be performed by various computing and processing-capable devices, such as a user terminal including but not limited to a mobile phone, a computer, an intelligent voice interaction device, an intelligent appliance, a vehicle-mounted terminal, a wearable device, or a cloud server. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, driving assistance and the like.
Referring to fig. 7, the method for identifying key segments in content at least includes the following steps:
in step 710, the content is divided into a plurality of segments.
The content can be any one of books, articles, videos, audios and pictures; when the content is an article, the content may include not only information such as characters and symbols, but also at least one type of information such as video, audio, pictures and tables.
The content can be divided according to various rules, and the dividing mode can be selected according to different content types. For example, when the content is audio or video, the content may be divided according to time length, and each minute is divided into one segment, so that the lengths of the segments are similar; when the content is a book, each page of the book may be divided into one segment.
In the following, the contents are taken as articles to further describe the scheme of the embodiment of the application.
In one embodiment of the present application, the content is an article, the article includes a plurality of sentences, and the dividing the content into a plurality of segments includes: the article is divided into a plurality of segments according to the sentence such that the number of characters in each segment does not exceed a predetermined number of characters.
The predetermined number of characters can be set according to needs, and the predetermined number of characters can be 150, for example, so that the divided segments can be made into one sentence as the shortest and several sentences with the longest length not exceeding 150 characters by dividing the article into segments.
In one embodiment of the present application, the article further includes at least one of video, audio, pictures, and tables, the content is divided into a plurality of segments, and the method further includes: a video, an audio, a picture or a table in an article are individually divided into a segment.
In one embodiment of the present application, dividing an article into a plurality of segments according to sentences so that the number of characters in each segment does not exceed a predetermined number of characters, includes: starting from the first sentence of the article, extracting one sentence from the unextracted sentences each time and adding the sentence into the sentence set extracted in the round; if the next sentence is continuously extracted and added into the sentence set extracted in the current round, the number of characters of the sentence set extracted in the current round is larger than the preset number of characters, the sentence set extracted in the current round is formed into a current segment; if the number of characters of the first sentence added into the sentence set extracted in the current round is larger than the preset number of characters, the sentence is divided into two segments; and continuously extracting from the first unextracted sentence of the article, and forming a next segment according to the next round of extracted sentence set until all sentences of the article are divided into corresponding segments.
If the number of characters of the sentence added first to the sentence set extracted in the present round is more than 150, the sentence can be divided into two segments on average.
In one embodiment of the present application, dividing content into a plurality of segments includes: judging whether the content is an article of a preset type; and if the content is the article of the preset type, dividing the content into a plurality of segments.
The preset type of articles can be self-media articles, for example, the quality of the self-media articles is different, and by only identifying the key segments of the self-media articles, only the key segments can be identified and provided for the user under the condition that the content value of the self-media articles is low, so that the time of the user can be saved to a greater extent.
In one embodiment of the present application, dividing content into a plurality of segments comprises: judging whether the word number of the article exceeds a preset word number; if the word count of the article exceeds a predetermined word count, the article is divided into a plurality of sections.
The article with excessive words is a relatively lengthy article, and in the case of the article with excessive words, only the key segment is identified and provided to the user, so that the time of the user can be saved to a greater extent.
In one embodiment of the present application, dividing content into a plurality of segments includes: determining the association degree of the title of the article and the content of the article; and if the association degree is lower than a preset association degree threshold value, dividing the article content into a plurality of segments.
When the relevance between the article title and the article content is low, the article content is explained to contain a large amount of information irrelevant to the article title, and in this case, only the key fragments are identified and provided for the user, so that the time of the user can be saved to a greater extent.
In step 720, behavior data generated by the plurality of objects when accessing the respective segments of the content is obtained.
The object may be an entity capable of accessing the content, such as a user account operated by the user, an IP address used by the user to access the terminal, and the like.
Behavior data generated when the object accesses each segment in the content can be acquired by setting a corresponding script or code for the content.
When the content is an article, the user can read the article in a web page mode, and therefore access to the content can be achieved.
When the content is an article, the object can be a user account of the user or an account of an approval person, the user logs in a content platform by using the user account to read the article, and the approval person can also log in the content platform by using the account to approve the article uploaded to the content platform, which is equivalent to accessing the content.
Each object can access each segment in the content, and different behavior data may be generated when different objects access the same segment.
FIG. 8 shows a flowchart of details of step 720 in FIG. 7 according to one embodiment of the present application. As shown in fig. 8, the step of acquiring behavior data generated when a plurality of objects access each segment of the content may specifically include the following steps:
in step 720', the exposure duration of each exposure of each segment in the article by the plurality of objects while reading the article is obtained to obtain the exposure durations respectively corresponding to the multiple exposures of each segment, wherein the exposure duration is the duration from the time the segment appears on the screen to the time the segment disappears from the screen.
Specifically, the behavior data in this embodiment of the application includes an exposure duration of the segment at each exposure, that is, the segment appears or completely appears in the screen when the user reads the article, the user moves the display of the article on the screen from when the segment appears on the screen, and the exposure duration can be obtained by ending the timing of the exposure until the article is scribed out of the screen by moving the article. Each user can expose the segment once every time the user reads the article, and different users can expose the same segment while reading the same article, so that each segment can be exposed for multiple times.
For an article, a user can stay longer in a key and more worth reading segment when reading the article, so that the exposure duration directly reflects the key degree of one segment, and the key segment can be accurately identified according to the behavior data of the exposure duration of each segment in each exposure.
In step 730, the criticality of each segment in the content is determined according to the behavior data generated by the plurality of objects when accessing each segment in the content, and the criticality is used for measuring the criticality of the segment in the content.
The criticality may also be referred to as essence, and the higher the criticality of a segment in the content, the more worthwhile the segment in the content is to be accessed.
FIG. 9 illustrates a flowchart of details of step 730 in FIG. 8 according to one embodiment of the present application. Referring to fig. 9, the criticality of each segment in the content may be determined by:
in step 731, according to the exposure duration of each exposure of each segment in the article when the plurality of objects read the article, normalization processing is performed on the exposure durations respectively corresponding to the multiple exposures of each segment, so as to obtain the normalized exposure duration of each segment.
Specifically, the normalized exposure time for each segment can be obtained using the following formula:
Figure BDA0003800936130000131
wherein, T i For the normalized exposure time duration of the ith segment,
Figure BDA0003800936130000132
indicating the exposure time of the ith segment at the jth exposure, n is the number of segment exposures,
Figure BDA0003800936130000133
is the sum of the exposure time periods of all exposures of all segments.
In an embodiment of the application, before normalizing, according to an exposure duration of each exposure of each segment in an article when a plurality of objects read the article, an exposure duration respectively corresponding to multiple exposures of each segment, the method for identifying a essence segment in content further includes:
and eliminating the corresponding segments of which the exposure time does not exceed the preset exposure time threshold.
The preset exposure time threshold may be set according to an actual application scenario, for example, may be set to 1 second.
The fragments with short exposure time are basically impossible to be key fragments, and the calculated amount is saved by removing the fragments with short exposure time in advance.
In step 732, the criticality of each segment in the article is determined based on the normalized exposure duration of each segment in the article.
The normalized exposure time of the segment can be directly used as the criticality of the segment, and the criticality can be obtained by further calculating the normalized exposure time, as long as the normalized exposure time is positively correlated with the criticality.
When the criticality of the fragments is determined, the criticality of the fragments can be more accurately measured by performing normalization processing.
In an embodiment of the present application, obtaining behavior data generated when a plurality of objects access respective segments in content further includes: acquiring the exposure times of each segment in an article when a plurality of objects read the article; determining the criticality of each segment in the article according to the normalized exposure duration of each segment in the article, comprising: according to the exposure times of all the segments in the article, carrying out normalization processing on the exposure times of all the segments in the article to obtain the normalized exposure times of all the segments; and determining the criticality of each segment in the content according to the normalized exposure time and the normalized exposure times of each segment in the article.
Each time a segment of an article appears completely in the screen is an exposure to that segment. The behavior data in the embodiment of the present application further includes the number of exposures of the segment.
Similar to the way of normalizing the exposure time, the normalized exposure times of each segment can be obtained by using the following formula:
Figure BDA0003800936130000141
wherein E is i Is the normalized number of exposures for the ith segment,
Figure BDA0003800936130000142
representing the number of exposures of the ith segment at the jth exposure, n being the total number of original exposures of the segment exposure,
Figure BDA0003800936130000143
is the sum of the exposure times of all exposures of all segments.
According to the formula, the normalized exposure time and the normalized exposure times are real numbers between 0 and 1, and the criticality of the fragments can be more accurately measured by determining the criticality of the fragments according to the normalized exposure time and the normalized exposure times.
The criticality of the segment can be calculated in various ways according to the normalized exposure time and the normalized exposure times, and the calculated criticality can be positively correlated with the normalized exposure time and the normalized exposure times.
In an embodiment of the present application, determining the criticality of each segment in the content according to the normalized exposure duration and the normalized exposure times of each segment in the article includes: and determining the weighted sum of the normalized exposure time length and the normalized exposure times of each segment as the criticality of each segment.
In calculating the weighted sum, the weights of the normalized exposure time length and the normalized exposure times may be set as needed, for example, the weight of the normalized exposure time length may be made larger than the weight of the normalized exposure times.
In one embodiment of the present application, criticality of a segment is determined by the following formula:
Figure BDA0003800936130000144
wherein E is i Normalized number of exposures, T, for the ith segment i Normalized exposure for ith segmentDuration of light v i Is the criticality of the ith fragment.
In an embodiment of the application, the behavior data further includes active operation record data, and the normalized exposure duration of each segment is obtained by normalizing the exposure durations respectively corresponding to multiple exposures of each segment according to the exposure duration of each exposure of each segment in the article when the article is read by a plurality of objects, including: for each segment in the article, according to active operation record data generated by the segment during exposure, weighting the exposure duration of at least one exposure of the segment, and replacing the exposure duration with the corresponding weighted exposure duration; and normalizing the multiple exposure time lengths corresponding to each segment to obtain the normalized exposure time length of each segment, wherein the multiple exposure time lengths comprise weighted exposure time lengths.
In one embodiment of the present application, the proactive operational logging data is at least one of: exposing the segment by a sliding operation, performing a selection operation on at least one part of the segment, and displaying the segment on a target area of a screen.
When the segment generates active operation record data during a certain exposure, the exposure duration of the exposure needs to be weighted, the weighting processing mode can be set as required, and only the weighted exposure duration generated through the weighting processing needs to be larger than the exposure duration based on which the weighted exposure duration is generated. Thus, for a segment, normalization is performed based on the weighted post-exposure time length of at least one exposure of the segment and the exposure time lengths of the other exposures.
Specifically, the screen may be divided into an upper area, a middle area, and a lower area, and each area may be divided in an average manner or may be divided in other designated ratios. The target area of the screen may be a middle area. Because the attention of the user is different in different screen areas, the content displayed in the middle area of the screen is more likely to be the content which is being read by the user, the normalized exposure time can be more accurately determined by taking the target area of the segment displayed on the screen as the active operation recording data and carrying out special processing on the behavior data such as the exposure time of the corresponding segment, and the key segment can be more accurately identified.
When a segment is displayed in the middle area of the screen, the weighting process may be performed only on the exposure duration of the segment in the middle area, for example, the exposure duration of the segment in the middle area may be multiplied by 2, and the exposure duration in other areas may be kept unchanged, so as to calculate the weighted exposure duration.
When a user exposes a segment by sliding up, which is a signal of repeated reading by the user, the exposure duration of the segment can be doubled to perform weighting processing.
When the user selects at least one part of a certain segment, which indicates that the user pays additional attention to the selected content, the exposure time of the segment can be weighted according to double, and in addition, the weight for weighting the exposure time of the segment can be determined according to the proportion of the length of the selected content to the length of the segment.
FIG. 10 illustrates a flow diagram for obtaining a number of exposures of multiple objects to each segment in an article while reading the article according to one embodiment of the application. As shown in fig. 10, acquiring the number of exposures of each segment in the article when the articles are read by the multiple objects may specifically include the following steps:
in step 1010, a record of each exposure of each segment of an article by a plurality of subjects while reading the article is obtained, and a number of original exposures is calculated from the record of each exposure.
Each exposure of a segment corresponds to an exposure record, i.e. a segment counts the number of raw exposures per exposure.
In step 1020, for each segment in the article, the original exposure times recorded by at least one exposure of the segment are weighted according to the active operation record data generated by the segment during exposure, so as to obtain weighted original exposure times.
When the segment generates active operation record data during a certain exposure, the weighting processing needs to be performed on the original exposure times of the exposure, the weighting processing mode can be set according to needs, and only the original exposure times after the weighting processing needs to be larger than the original exposure times corresponding to the original exposure times after the weighting processing.
When a certain segment is exposed through the sliding operation of the user, or at least one part of the segment is selected by the user, or the segment is displayed in the middle area of the screen, the original exposure times after the weighting processing can be obtained by multiplying the original exposure times of the single exposure of the segment by 2.
In step 1030, for each segment in the article, the exposure times of the segment are obtained according to the weighted raw exposure times of at least one exposure record of the segment and the raw exposure times of other exposure records.
Under the condition that active operation record data is not generated, the original exposure times of exposure record are 1; and if the segment generates active operation recording data at the time of exposure, the number of times of original exposure after the weighting process of the exposure is a real number larger than 1.
When the segment generates a plurality of items of active operation record data during a certain exposure, a larger weight value for weighting processing can be set so as to further improve the accuracy of the calculated criticality.
In the embodiment of the application, the exposure duration and/or the exposure times are weighted according to the active operation record data generated by the exposure of the segments, so that the criticality for measuring the criticality of the segments in the content can be calculated more accurately, and the key segments can be identified more accurately.
At step 740, at least one key snippet is identified among the plurality of snippets of content based on the criticality of each snippet in the content.
The criticality of a key segment is typically higher than the criticality of other segments in the content.
In one embodiment of the present application, identifying at least one key snippet in a plurality of snippets of content based on the criticality of each snippet in the content includes: and taking the segment with the highest criticality in the content as a key segment.
In one embodiment of the present application, identifying at least one key snippet among a plurality of snippets of content based on the criticality of each snippet in the content includes: and selecting the segments with the criticality higher than a preset criticality threshold value from the content as key segments.
In one embodiment of the present application, identifying at least one key snippet among a plurality of snippets of content based on the criticality of each snippet in the content includes: determining a first candidate key segment from a plurality of segments of the content according to the criticality of each segment in the content; extracting the behavior characteristics of each segment from behavior data generated when a plurality of objects access each segment in the content; respectively inputting the behavior characteristics of each segment into a pre-established artificial intelligence model to obtain a prediction probability value which is output by the artificial intelligence model and corresponds to each segment; determining a second candidate key fragment in the plurality of fragments of the content according to the prediction probability value corresponding to each fragment in the content; and determining the key segments according to the first candidate key segment and the second candidate key segment.
The type of the behavior feature can be predefined by experts, the behavior feature can be calculated in various ways, even the same way as the calculation of the normalized exposure time and the normalized exposure times, and each segment can have a plurality of behavior features. The artificial intelligence model can adopt various algorithms such as a logistic regression model and a deep learning model, and the prediction probability value output by the artificial intelligence model is used for predicting the probability that one segment belongs to the key segment.
The segment with the corresponding prediction probability value larger than the predetermined probability threshold may be used as the second candidate key segment, or the segment with the maximum corresponding prediction probability value may be used as the second candidate key segment.
In the embodiment of the application, the two modes are adopted to respectively identify the corresponding candidate key fragments, and the two candidate key fragments are combined to identify the key fragments, so that the accuracy of identifying the key fragments is improved.
In one embodiment of the present application, determining a key segment according to a first candidate key segment and a second candidate key segment includes: if the first candidate key segment and the second candidate key segment are consistent, the first candidate key segment or the second candidate key segment is taken as the key segment.
In an embodiment of the present application, the determining the key segments according to the first candidate key segment and the second candidate key segment includes: and taking the intersection of the first candidate key fragment and the second candidate key fragment as a key fragment.
In the embodiment of the application, only one segment belongs to both the first candidate key segment and the second candidate key segment, and the segment can become a key segment, so that the accuracy of identifying the key segment is ensured.
FIG. 11 shows a flowchart of steps following step 740 in FIG. 7, according to one embodiment of the present application. Referring to fig. 11, the following steps may be included after step 740:
in step 750, when the target object is accessing content, the content is provided to the target object and key snippets in the content are highlighted.
When a user requests access to an article, key snippets in the article are highlighted while the article is returned to the user.
Highlighting key snippets may be accomplished by one or more of bolding, changing font, zooming, tilting, underlining, changing color, etc.
Fig. 12 shows a flowchart of steps following step 740 in fig. 7 according to another embodiment of the application. Referring to fig. 12, the following steps may be included after step 740:
in step 760, when the target object is accessing the content, an operation control is provided to the target object in the display interface of the content.
The operation control can be a button, for example, and when a user reads an article, the display interface displays the complete content of the article; meanwhile, a button is also arranged in the display interface of the article.
In step 770, in response to the target object triggering the operation of the operation control, only the key segments in the content are displayed in the display interface.
When a user presses and clicks the button through a mouse or a screen, the display interface of the article can be directly switched from the display of the complete content of the article to the display of the key segment of the article, and therefore the efficiency of the user for obtaining information is improved.
Of course, in other embodiments of the present application, whether to highlight the key segments in the article may also be switched by setting the operation control.
Fig. 13 shows a schematic overall flow diagram of a solution according to an embodiment of the present application. Please refer to fig. 13, which may specifically include the following processes: firstly, dividing article fragments; then, performing behavior data recording according to the divided segments to obtain behavior data; next, performing behavior data processing; and finally, calculating the criticality of each segment according to the processing result, and identifying the key segment based on the criticality.
In summary, according to the method for identifying the key segments in the content provided by the embodiment of the application, the key segments which have higher value and are more worthy of access in the content can be automatically extracted by means of big data, so that the information acquisition efficiency of the user can be improved by providing the key segments for the user, and the time of the user is saved; when the content is an article, the scheme can find the most valuable part of the article to be read through behavior data analysis, so that a user can clearly see the essential part of the article, and the user can directly click the key point of the article and pay attention to the details. The user can have a simple and quick reading experience, and the whole reading process is controlled more strongly.
Embodiments of the apparatus of the present application are described below, which can be used to perform the method for identifying key segments in content in the above embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method for identifying key segments in the above-mentioned description of the present application.
FIG. 14 shows a block diagram of an apparatus for identifying key snippets in content according to one embodiment of the present application.
Referring to fig. 14, an apparatus 1400 for identifying key segments in content according to an embodiment of the present application includes: a division unit 1410, a behavior data acquisition unit 1420, a determination unit 1430, and a recognition unit 1440. Wherein, the dividing unit 1410 is configured to divide the content into a plurality of segments; the behavior data acquiring unit 1420 is configured to acquire behavior data generated when a plurality of objects access respective segments of the content; the determining unit 1430 is configured to determine a criticality of each segment in the content according to behavior data generated by the plurality of objects when accessing each segment in the content, where the criticality is used to measure a criticality of each segment in the content; the identifying unit 1440 is configured to identify at least one key segment among the plurality of segments of the content according to the criticality of each segment of the content.
In some embodiments of the present application, based on the foregoing scheme, where the content is an article, the behavior data acquiring unit 1420 is configured to: the method comprises the steps of obtaining the exposure duration of each exposure of each segment in an article when the article is read by a plurality of objects, so as to obtain the exposure duration corresponding to each multiple exposure of each segment, wherein the exposure duration is the duration from the segment appearing on a screen to the segment disappearing from the screen.
In some embodiments of the present application, based on the foregoing scheme, the determining unit 1430 is configured to: according to the exposure duration of each exposure of each segment in an article when the article is read by a plurality of objects, normalizing the exposure duration corresponding to the multiple exposures of each segment respectively to obtain the normalized exposure duration of each segment; and determining the criticality of each segment in the article according to the normalized exposure duration of each segment in the article.
In some embodiments of the present application, based on the foregoing scheme, the behavior data acquiring unit 1420 is further configured to: acquiring the exposure times of each segment in an article when the articles are read by a plurality of objects; the determination unit 1430 is further configured to: according to the exposure times of all the segments in the article, carrying out normalization processing on the exposure times of all the segments in the article to obtain the normalized exposure times of all the segments; and determining the criticality of each segment in the content according to the normalized exposure duration and the normalized exposure times of each segment in the article.
In some embodiments of the present application, based on the foregoing scheme, the behavior data further includes active operation record data, and the determining unit 1430 is configured to: for each segment in the article, weighting exposure duration of at least one exposure of the segment according to active operation record data generated by the segment during exposure, and replacing the exposure duration with corresponding weighted exposure duration; and normalizing the multiple exposure durations corresponding to each segment to obtain the normalized exposure duration of each segment, wherein the multiple exposure durations comprise the weighted exposure duration.
In some embodiments of the present application, based on the foregoing scheme, the behavior data acquiring unit 1420 is further configured to: acquiring each exposure record of each fragment in an article when the article is read by a plurality of objects, and calculating the original exposure times according to each exposure record; for each segment in the article, weighting the original exposure times of at least one exposure record of the segment according to active operation record data generated by the segment during exposure to obtain weighted original exposure times; and aiming at each segment in the article, obtaining the exposure times of the segment according to the original exposure times of at least one exposure record of the segment after weighting processing and the original exposure times of other exposure records.
In some embodiments of the present application, based on the foregoing scheme, the recognition unit 1440 is configured to: determining a first candidate key segment in a plurality of segments of the content according to the criticality of each segment in the content; extracting the behavior characteristics of each segment from behavior data generated when the plurality of objects access each segment in the content; respectively inputting the behavior characteristics of each segment into a pre-established artificial intelligence model to obtain a prediction probability value which is output by the artificial intelligence model and corresponds to each segment; determining a second candidate key segment in the plurality of segments of the content according to the prediction probability value corresponding to each segment in the content; and determining a key segment according to the first candidate key segment and the second candidate key segment.
In some embodiments of the present application, based on the foregoing solution, after identifying at least one key segment in the plurality of segments of the content according to the criticality of each segment in the content, the identifying unit 1440 is further configured to: when a target object is accessing the content, the content is provided to the target object and the key snippets in the content are highlighted.
In some embodiments of the present application, based on the foregoing scheme, after identifying at least one key segment in the plurality of segments of the content according to the criticality of each segment in the content, the identifying unit 1440 is further configured to: when a target object accesses the content, providing an operation control for the target object in a display interface of the content; and responding to the triggering operation of the target object on the operation control, and only displaying the key segments in the content in the display interface.
FIG. 15 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that the computer system 1500 of the electronic device shown in fig. 15 is only an example, and should not bring any limitation to the functions and the application scope of the embodiments of the present application.
As shown in fig. 15, the computer system 1500 includes a Central Processing Unit (CPU) 1501 which can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1502 or a program loaded from a storage section 1508 into a Random Access Memory (RAM) 1503. In the RAM 1503, various programs and data necessary for system operation are also stored. The CPU 1501, the ROM 1502, and the RAM 1503 are connected to each other by a bus 1504. An Input/Output (I/O) interface 1505 is also connected to bus 1504.
The following components are connected to the I/O interface 1505: an input portion 1506 including a keyboard, a mouse, and the like; an output section 1507 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1508 including a hard disk and the like; and a communication section 1509 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1509 performs communication processing via a network such as the internet. A drive 1510 is also connected to the I/O interface 1505 as needed. A removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1510 as necessary, so that a computer program read out therefrom is mounted into the storage section 1508 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1509, and/or installed from the removable medium 1511. When the computer program is executed by a Central Processing Unit (CPU) 1501, various functions defined in the system of the present application are executed.
It should be noted that the computer readable media shown in the embodiments of the present application may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As an aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
It is understood that in the specific implementation of the present application, data related to the behavior of the user to access the content is referred to, when the above embodiments of the present application are applied to specific products or technologies, user permission or consent needs to be obtained, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related countries and regions.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (13)

1. A method for identifying key snippets in content, the method comprising:
dividing the content into a plurality of segments;
acquiring behavior data generated when a plurality of objects access each segment in the content;
determining the criticality of each segment in the content according to behavior data generated when the plurality of objects access each segment in the content, wherein the criticality is used for measuring the criticality of the segment in the content;
at least one key segment is identified in the plurality of segments of the content according to the criticality of each segment in the content.
2. The method for identifying key segments in content according to claim 1, wherein the content is an article, and the obtaining behavior data generated by a plurality of objects when accessing each segment in the content comprises:
the method comprises the steps of obtaining the exposure duration of each exposure of each segment in an article when the article is read by a plurality of objects, so as to obtain the exposure duration corresponding to each multiple exposure of each segment, wherein the exposure duration is the duration from the segment appearing on a screen to the segment disappearing from the screen.
3. The method for identifying key segments in content according to claim 2, wherein the determining the criticality of each segment in the content according to the behavior data generated by the plurality of objects when accessing each segment in the content comprises:
according to the exposure duration of each exposure of each segment in an article when the article is read by a plurality of objects, normalizing the exposure duration corresponding to the multiple exposures of each segment respectively to obtain the normalized exposure duration of each segment;
and determining the criticality of each segment in the article according to the normalized exposure duration of each segment in the article.
4. The method for identifying key segments in content according to claim 3, wherein said obtaining behavior data generated by a plurality of objects when accessing each segment in the content further comprises:
acquiring the exposure times of each segment in an article when the articles are read by a plurality of objects;
determining the criticality of each segment in the article according to the normalized exposure duration of each segment in the article, including:
according to the exposure times of all the segments in the article, carrying out normalization processing on the exposure times of all the segments in the article to obtain the normalized exposure times of all the segments;
and determining the criticality of each segment in the content according to the normalized exposure time and the normalized exposure times of each segment in the article.
5. The method according to claim 4, wherein the behavior data further includes active operation record data, and the normalizing, according to the exposure duration of each exposure of each snippet in the article when the article is read by a plurality of objects, the exposure duration corresponding to each multiple exposure of each snippet to obtain the normalized exposure duration of each snippet includes:
for each segment in the article, weighting exposure duration of at least one exposure of the segment according to active operation record data generated by the segment during exposure, and replacing the exposure duration with corresponding weighted exposure duration;
and normalizing the multiple exposure durations corresponding to each segment to obtain the normalized exposure duration of each segment, wherein the multiple exposure durations comprise the weighted exposure duration.
6. The method for identifying key snippets in content according to claim 5, wherein the obtaining the number of times of exposure of each snippet in an article when the article is read by a plurality of subjects comprises:
acquiring each exposure record of each fragment in an article when the article is read by a plurality of objects, and calculating the original exposure times according to each exposure record;
for each segment in the article, weighting the original exposure times of at least one exposure record of the segment according to active operation record data generated by the segment during exposure to obtain weighted original exposure times;
and aiming at each segment in the article, obtaining the exposure times of the segment according to the original exposure times of at least one exposure record of the segment after weighting processing and the original exposure times of other exposure records.
7. The method for identifying key segments in content according to claim 1, wherein the identifying at least one key segment in the plurality of segments of the content according to the criticality of each segment in the content comprises:
determining a first candidate key segment in a plurality of segments of the content according to the criticality of each segment in the content;
extracting behavior characteristics of each segment from behavior data generated when the plurality of objects access each segment in the content;
respectively inputting the behavior characteristics of each segment into a pre-established artificial intelligence model to obtain a prediction probability value which is output by the artificial intelligence model and corresponds to each segment;
determining a second candidate key segment in the plurality of segments of the content according to the prediction probability value corresponding to each segment in the content;
and determining a key segment according to the first candidate key segment and the second candidate key segment.
8. The method for identifying key segments in content according to any one of claims 1-7, wherein after identifying at least one key segment in the plurality of segments of the content according to the criticality of each segment in the content, the method further comprises:
when a target object is accessing the content, the content is provided to the target object and the key snippets in the content are highlighted.
9. The method of any one of claims 1-7, wherein after identifying at least one key snippet in a plurality of snippets of the content based on the criticality of each snippet of the content, the method further comprises:
when a target object accesses the content, providing an operation control for the target object in a display interface of the content;
and responding to the triggering operation of the target object on the operation control, and only displaying the key segments in the content in the display interface.
10. An apparatus for identifying key snippets in content, the apparatus comprising:
a dividing unit for dividing the content into a plurality of segments;
a behavior data acquisition unit for acquiring behavior data generated when a plurality of objects access respective segments in the content;
the determining unit is used for determining the criticality of each segment in the content according to behavior data generated when the plurality of objects access each segment in the content, and the criticality is used for measuring the criticality of the segment in the content;
the identification unit is used for identifying at least one key fragment in the plurality of fragments of the content according to the criticality of each fragment in the content.
11. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method of identifying key segments in content according to any one of claims 1 to 9.
12. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of identifying key snippets in content according to any one of claims 1 to 9.
13. A computer program product comprising computer instructions stored in a computer readable storage medium, the computer instructions being read from the computer readable storage medium by a processor of a computer device, the processor executing the computer instructions to cause the computer device to perform the method of identifying key segments in content as claimed in any one of claims 1 to 9.
CN202210982950.3A 2022-08-16 2022-08-16 Method, device, medium and electronic equipment for identifying key segments in content Pending CN115357790A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210982950.3A CN115357790A (en) 2022-08-16 2022-08-16 Method, device, medium and electronic equipment for identifying key segments in content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210982950.3A CN115357790A (en) 2022-08-16 2022-08-16 Method, device, medium and electronic equipment for identifying key segments in content

Publications (1)

Publication Number Publication Date
CN115357790A true CN115357790A (en) 2022-11-18

Family

ID=84033415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210982950.3A Pending CN115357790A (en) 2022-08-16 2022-08-16 Method, device, medium and electronic equipment for identifying key segments in content

Country Status (1)

Country Link
CN (1) CN115357790A (en)

Similar Documents

Publication Publication Date Title
WO2019200783A1 (en) Method for data crawling in page containing dynamic image or table, device, terminal, and storage medium
CN110020009B (en) Online question and answer method, device and system
CN108572990B (en) Information pushing method and device
US8924409B1 (en) Presentation of match quality in auto-complete suggestions
US10977317B2 (en) Search result displaying method and apparatus
US9767198B2 (en) Method and system for presenting content summary of search results
CN110334356B (en) Article quality determining method, article screening method and corresponding device
US10083004B2 (en) Using voice-based web navigation to conserve cellular data
US20190130025A1 (en) Ranking of documents based on their semantic richness
CN108133058B (en) Video retrieval method
CN107368489B (en) Information data processing method and device
US11416539B2 (en) Media selection based on content topic and sentiment
CN111783450B (en) Phrase extraction method and device in corpus text, storage medium and electronic equipment
CN113254777B (en) Information recommendation method and device, electronic equipment and storage medium
US11651039B1 (en) System, method, and user interface for a search engine based on multi-document summarization
US9454568B2 (en) Method, apparatus and computer storage medium for acquiring hot content
WO2021257178A1 (en) Provide knowledge answers for knowledge-intention queries
CN110245357B (en) Main entity identification method and device
US11836197B2 (en) Search processing method and apparatus based on clipboard data
US20130230248A1 (en) Ensuring validity of the bookmark reference in a collaborative bookmarking system
CN109145261B (en) Method and device for generating label
CN106959945B (en) Method and device for generating short titles for news based on artificial intelligence
CN115357790A (en) Method, device, medium and electronic equipment for identifying key segments in content
CN115618873A (en) Data processing method and device, computer equipment and storage medium
CN113221572A (en) Information processing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination