WO2015012720A1 - Procédé pour vérifier des pages web comprenant des flux multimédia en temps réel, et système mis en œuvre sur ordinateur pour réaliser le procédé - Google Patents

Procédé pour vérifier des pages web comprenant des flux multimédia en temps réel, et système mis en œuvre sur ordinateur pour réaliser le procédé Download PDF

Info

Publication number
WO2015012720A1
WO2015012720A1 PCT/RU2013/001055 RU2013001055W WO2015012720A1 WO 2015012720 A1 WO2015012720 A1 WO 2015012720A1 RU 2013001055 W RU2013001055 W RU 2013001055W WO 2015012720 A1 WO2015012720 A1 WO 2015012720A1
Authority
WO
WIPO (PCT)
Prior art keywords
stream
streams
links
database
multimedia
Prior art date
Application number
PCT/RU2013/001055
Other languages
English (en)
Russian (ru)
Inventor
Денис Олегович ОРЕЛ
Алексей Николаевич ФОМИЧЕВ
Original Assignee
Общество С Ограниченной Ответственностью "Балакам"
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Общество С Ограниченной Ответственностью "Балакам" filed Critical Общество С Ограниченной Ответственностью "Балакам"
Publication of WO2015012720A1 publication Critical patent/WO2015012720A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Definitions

  • the present invention relates to computer and network technologies, namely, to search engines on the Internet, the purpose of which is to download, analyze, save and index Web pages containing targeted content, which is, for example, a real-time multimedia stream or, so called "live" stream or "live” content.
  • the invention relates to a technology for rechecking web pages previously found by search robots on the Internet that host real-time multimedia streams.
  • rechecking is carried out according to a schedule that determines the order (using the coefficient of significance for the page) and the period (frequency) of rechecking. All web pages containing real-time multimedia streams should be double-checked for a certain period in order to store up-to-date data in the search system in order to further provide the user with the ability to search.
  • the invention can be applied both to search for specific media objects (on-line radio streams, streams from webcams, video streams, etc.), and to search for objects in the form of links to external sources of a certain type, signaling the presence of target audio and / or real-time video content, for example, data transfer schemes - rtmp, rtsp, mms, etc.
  • search engines are widely known and massively used in the world, which provide users with the ability to search the Internet for web pages containing the information they need for the search queries they enter.
  • Popular search engines are, in particular, Yahoo!, Google, Yandex, Rambler.
  • search engines The general principle of operation of well-known search engines is based on the collection of information on web pages on the Internet, its processing and indexing to • further provide the user with the opportunity to search for the necessary information to the extent that has been processed by the search engine.
  • Each search engine includes search robots, the purpose of which is to scan web pages on the Internet and load them. After the search robot contacts the specified web page address, it scans, for example, http-headers, checking when the page was last modified. If the search robot has already viewed this web page, and the date of the last modification of the page has changed, then it will load it again for processing, if the web page it browses has not been viewed at all, then it will be immediately loaded for processing.
  • Web pages loaded by the search robot are processed by the corresponding software and hardware components of the search engine.
  • the purpose of this processing is to analyze the page: as a rule, the title is first extracted from the web page, since it carries general information about the web page. Next, all the text is extracted and processed, which is somehow highlighted, for example, in italics, underline or size (in particular, its font size is larger than the font size of the main text), since the search engine assumes that these are key places in the text and they are emphasized .
  • Some search engines look at the meta tags of web pages, suggesting that they contain keywords or phrases on the page. At the same time, since the content of meta tags is often given inaccurate information, some search engines do not use them to determine the keywords of the page.
  • the entire text of the web page is fully processed.
  • those search engines that do not use meta tags to determine the keywords of a web page search for keywords by checking for how often a particular word appears in the text, and for this all “stop words” are removed from the text such as ⁇ a>, ⁇ he>, ⁇ you>, ⁇ b>, as well as all the characters and numbers, as they create noise when searching for keywords.
  • the processed text of the web page is properly indexed by the search engine so as to provide the user with a web interface (for example, a browser) the ability to conveniently search the search engine database based on the input of search queries.
  • a web interface for example, a browser
  • search engines constructed in this way cease to meet today's requirements due to the ever-increasing volume and variety of information presented on the Internet.
  • an extensive resulting list of web pages is produced in which the proportion of pages that truly meet the requirements and requirements of the user is small, since this list , due to the specifics of the search engines described, those web pages that contain mentions, discussions, advertisements, reviews about pictures or videos, but do not directly contain the search pictures or and video.
  • the proportion of relevant web pages for such specific searches will only decrease, and as a result, users will be forced to build repeated search queries and spend time sifting through large arrays of search results.
  • This problem determines the relevance of creating specialized (so-called, vertical) search engines that are strictly focused on searching on thematic Internet resources, including a system for re-checking found objects according to a special schedule.
  • the invention provides a method for constructing a re-check schedule for web documents based on information about the document being checked.
  • the frequency of change of the web document itself is determined based on the history of its previous check, which allows one to determine the time interval within which the web document has changed, and based on this information, calculate the optimal time interval for double-checking it.
  • the known solution is based on double-checking all possible web documents on the Internet. This decision does not take into account the possibility of excluding non-target data from verification. Also, such a solution requires storing the history of a previous check of web documents, which is a highly costly solution, given the amount of data available on the Internet. Moreover, in the well-known technical solution, there is no possibility of making a decision on rechecking operatively, since the construction of a rechecking schedule is based on the history of the previous page check.
  • the objective of the present invention is to provide a method and system focused on identifying web pages with link (s) to the real-time multimedia stream based on the results of checking an array of web pages according to a specific schedule.
  • the technical result of the invention is to increase the efficiency (including performance) of detecting multimedia streams in real time, the links to which are contained in the checked web pages, as well as reducing the number of resource-intensive operations (optimization of the used computing and network resources) used in checking the web pages that do not contain such content, as well as reducing the time to save and maintain data up to date, while increasing the reliability of the results.
  • the inventive method can be implemented with significantly lower costs, including temporary (in comparison with well-known counterparts) required to search for web pages with target content while increasing the relevance of the detected web pages to the presence of the target content.
  • the results produced by the search program practically do not contain information noise.
  • the results obtained correspond to the search criteria of the real-time multimedia content set by the user and contain only reliable content, so the user spends less time filtering the search results.
  • a web page is a file directly containing the text of the web page and / or a script file associated with this web page.
  • Downloading a web page by reference can be done by emulating the operation of a web browser by building a model of a web document and creating all objects that potentially contain links to multimedia streams.
  • the period of checking links to streams from the database of streams having the status of a multimedia stream of real time to detect changes in the type of stream, and / or the state of the stream, and / or in the characteristics of the stream, can be selected from the interval of values 2-5 minutes.
  • the state of the stream is determined based on whether it is on or off.
  • the relationship database has a structure showing that a link to a web page belongs to one or more links to multimedia streams, while in the relationship database, streams that are of the type of real-time multimedia streams are noted.
  • characteristics of the stream a description of the multimedia stream and technical characteristics of the stream can be used.
  • a description of a multimedia stream use the text description of the stream, the title of the stream, an indication of the owner of the stream, a link to the site of the stream, or any other data transmitted within the stream and displaying its essence.
  • bitrate, format, information about audio or video codecs or any other technical characteristics of the stream are used.
  • the availability of web pages is additionally determined, and if inaccessible web pages are detected, an appropriate check mark is made in the check schedule. If an unavailable web page is in this state for a week, the link to this web page is excluded from the scan schedule.
  • T con st is the specified check period, for example, 24 hours
  • T m j n the minimum validation period
  • a computer-implemented system for checking web pages for the presence of multimedia streams in real time includes:
  • Schedule database including a list of links to web pages with a period for each link and the procedure for checking it
  • Stream database including a list of links to multimedia streams, as well as information about the type, status and characteristics of streams,
  • Relationship database storing information about the affiliation of the multimedia stream to the corresponding web page, as well as the type of multimedia stream
  • Data loading module configured to download web pages via a link from the schedule database and analyze downloaded web pages for links to multimedia streams in them
  • a data management module configured to save multimedia streams of links found by the data loading module to the database streams, as well as saving and / or changing information about the relationship between the multimedia stream and the web page in the relationship database,
  • a module for checking streams from a database of streams configured to determine the type of a multimedia stream, as well as periodically check references to multimedia streams in real time to detect changes in the type of stream and / or in the state and / or characteristics of the stream, followed by storing the received information in the stream database
  • a flow control module configured to detect changes made to the flows database, followed by recording information about changes in the relationships database
  • Schedule management module configured to change the schedule in the schedule database by adding new links to web pages into it, on which links to real-time multimedia streams are revealed, and / or by changing the check period for existing links to web pages in the schedule, for which there are changes in the stream, and / or changes in the start time of the next check, while changing the characteristics of the stream, the start time of the next check of the corresponding web page is changed to The present, while maintaining the verification period; when detecting changes in the type and / or status of the stream associated with an increase in the number of links to real-time multimedia streams on a web page, the verification period is reduced, and when the number of links to real-time multimedia streams on a web page is reduced, the verification period is increased, and in the case when the number of links to real-time multimedia streams becomes equal to zero, this link to a web page is excluded from the scan schedule.
  • the data loading module is also configured to emulate the operation of a web browser by building a model of a web document and creating all objects potentially containing links to multimedia streams. Additionally, the data loading module is configured to determine the availability of a web page, and in case of unavailable web pages, information about this is recorded in the schedule database. If an unavailable web page is in this state for a week, the schedule management module is implemented with the ability to exclude links to this web page from the schedule database.
  • the module for checking streams is also configured to change the period for checking links to streams from the database of streams having the status of a multimedia stream of real time from 2 to 5 minutes. The module for checking flows when checking the status of a real-time stream determines whether a given stream is on or off.
  • a description of the multimedia stream and technical characteristics of the stream are present; as a description of a multimedia stream, there is a text description of the stream, the title of the stream, an indication of the owner of the stream, a link to the site of the stream, or any other data transmitted within the stream and showing its essence; as the technical characteristics of the stream there is bitrate, format, information about audio or video codecs or any other technical characteristics of the stream.
  • the data loading module is configured to download web pages and analyze them starting from a link to a web page from the schedule database that has the highest coefficient K.
  • a distinctive feature of the claimed technical solution is that the criterion for rechecking the web page (or web document) is not the web page itself, but the connection of this web page with a link to the real-time multimedia stream (or information about the multimedia stream belonging to the web -page).
  • the criterion for checking a web page is the presence in this page of a link to a multimedia server that broadcasts the multimedia stream in real time.
  • the claimed solution allows you to dynamically exclude or include from the check web pages that link to multimedia streams in case of data changes in the type, condition or characteristics of the stream, for example, if the multimedia stream Since real-time is disabled, then all web pages that link to this stream are excluded from re-checking. This decision is based not on the dynamics of changes in these web pages, but on an independent resource, such as a multimedia server, changes in which lead to a double-check of the web page that refers to this multimedia resource.
  • FIG. 1 presents a block diagram of the inventive system for checking web pages for the presence of a multimedia stream of real time
  • FIG. 2 is a flowchart showing a flowchart of calculating web page verification parameters
  • Fig. 3 shows a mapping of a link to a web page with links to multimedia streams
  • figure 4 presents an example of linking links to web pages and links to streams
  • figure 5 is an example of data communication after re-verification
  • figure 6 presents the algorithm for storing threads in the database of threads
  • figure 7 presents the display of changes in the structure of relations;
  • 1 a schematic representation of the movement of data between the modules of the system; 2 - data loading module, receiving links to web pages from the database of schedule 9; 3 - a data management module that creates and modifies associative links between links to web pages and links to multimedia streams; 4 - stream database, which contains links to multimedia streams and all information about them; 5 is a module for checking streams, which determines the type, status and characteristics of the multimedia stream; 6 is a database of relationships, which stores relevant associative links between links to web pages and links to multimedia streams; 7 is a flow control module that detects changes in the type, condition, or characteristics of flows in a stream database 4, followed by a record of changes in the relationship database 6; 8 - schedule management module, which makes changes to the schedule database 9 by adding new records to it or by modifying existing records; 9 - schedule database, which contains a list of links to web pages with a period for checking it for each link and the verification procedure.
  • the claimed invention allows to optimize the scan schedule of web pages that contain links to real-time multimedia streams by calculating the optimal scan period. Changing the validation period of a web page is based on the change:
  • - type of multimedia stream - is the link to the multimedia stream a real-time stream; and / or
  • the basis for calculating the period of checking the web page are multimedia streams of real time, located on the web page.
  • Search robots find web pages on the Internet that host multimedia streams. All web pages containing real-time multimedia streams should be re-checked for a certain period in order to store up-to-date data in the search engine related to the checked web page to further enable the user to search.
  • the schedule is used, which is located in the database of schedule 9, in which the coefficient of significance for the checked link to the web page is set, as well as the period of verification and the time of the end of the last check and the start of the next check. All links to web pages containing multimedia streams are stored in the system in the schedule database 9. The streams located on the page are checked to determine their membership in real-time multimedia streams.
  • the purpose of re-checking web pages is to search on web pages for new links to multimedia streams and confirm the presence on the web page of links to multimedia streams found during the previous check of the web page, as well as updating the availability of the page and the information contained on it that displays it essence.
  • Data loading module 2 receives a list of links to web pages, which must be checked in accordance with the schedule from schedule database 9.
  • the loaded pages of data loading module 2 are analyzed, which searches for multimedia links to streams.
  • information related to the multimedia stream is extracted in the form of a text description, which in the future used as part of the media stream description.
  • the obtained information of the downloaded and analyzed web page and the links to the multimedia streams found in it is transmitted to the data management module 3.
  • the data management module stores the found links to the multimedia streams in the stream database 4.
  • the data management module receives information from the database of streams 4 about the current type of multimedia stream to mark in the database of relationships 6.
  • the module data management 3 begins to check and make changes to the database linkages 6:
  • All multimedia streams located in the database of streams 4 are checked by the streaming check module 5. All new streams are checked to identify real-time multimedia streams, as well as all multimedia streams that are defined as real-time streams and are in working (on) state, or were in the on state, but currently turned off. A regular check of multimedia streams in real-time status is carried out in order to store relevant information about streams, in the framework of which it is determined:
  • the status of the multimedia stream (for example, the server transmitting the multimedia stream is on or off); • Specifications and a description of the multimedia stream, as well as track their changes.
  • the flow control module 7 receives from the stream database 4 a list of real-time multimedia streams that have changes in type, and / or status, and / or technical characteristics, and / or description. Next, the flow control module 7 makes a note of the changes in the relationship database 6 for all links to web pages that have associative links with the resulting list of multimedia streams. It should be noted that with a single link to a multimedia stream can be associated with many links to web pages.
  • Schedule management module 8 retrieves a list of links to web pages from the relationship database 6, for which there is a mark on changes in real-time multimedia streams. For the received links to web pages, the degree of changes in the flow is determined, the significance coefficient is calculated, and the verification period is also calculated.
  • Schedule management module 8 saves the results to the database of schedule 9 for the received list of links to web pages, changing the start time of the next check for links to web pages, and also adds or excludes links to web pages from re-checking. Links available for download on schedule 9 are sent for download and analysis to data loading module 2.
  • the algorithm for checking links to streams from the database of streams for the presence of multimedia streams in them in real time includes the following steps:
  • protocol headers can be additionally used.
  • the value of the parameter characterizing the duration of the stream (Duration) is in the range from zero to the specified limit, reconnect to the media server and determine the values of this parameter and the parameter characterizing the position from which playback starts (Start Time), which compared with the values of similar parameters obtained during the initial connection, and if at least one of the parameter values does not match, it is concluded that the analyzed stream is the source multimedia broadcasting in real time; in case the parameter values coincide, they search for signs of the multimedia stream in the server response headers, upon detection of which they conclude that the stream being checked is a multimedia source broadcast in real time.
  • the established limit of the values of the parameter characterizing the duration of the flow is selected experimentally and can be in the range of values from 5 to 9 hours.
  • the server has not received the values of the stream duration and / or playback position parameters, it is concluded that the stream being checked is a multimedia source broadcast in real time.
  • the thread verification module for implementing the above algorithm contains:
  • a multimedia client configured to connect to a media server via a link and download information about the media stream, including the characteristics of the stream in a given format and / or a specific part of the stream, intended for playback on the client side and / or information about the protocol headers received from the server ,
  • - a unit for analyzing information about the media stream, which is configured to check the received information about the media stream, which consists in searching for signs indicating that the analyzed stream is a multimedia source broadcast in real time, where any sequence is used as the signs characters or bytes in the media stream, based on which they conclude that the media stream meets the criterion of "live" stream.
  • Such applications as MPlayer or VLC media player can be used, as well as any other product, including a self-developed multimedia client, configured to communicate, process and provide the necessary information.
  • the technology for determining the type of stream consists in analyzing meta-information obtained from the media stream itself.
  • the media client connects to the media server, after which receives from him meta-information about the stream in a given format, as well as a certain part of the stream intended for playback on the client side.
  • the received meta-information, as well as the transmitted media stream buffer pass the verification stage in order to determine the type of stream.
  • the main purpose of the check is to analyze the data and search for signs that indicate that the analyzed stream is a multimedia source, the broadcast of which is carried out in real time.
  • a characteristic feature of a “live” stream (content) is the inability to perform “fast forward” with respect to it using the means of a client playback application.
  • Typical examples of “live” AV content on the Internet are television (TV) and radio broadcasting on-air studios, special Internet broadcasting by professional and amateur studios, and images from a webcam for streaming broadcasting.
  • Schedule Management Module 8 determines the changes in the stream. If there are changes in the characteristics of the stream, which may include data such as a description of the stream, the title of the stream, an indication of the owner of the stream, a link to the site of the stream, or any other data transmitted within the stream and showing its essence, as well as changes in the technical characteristics of the stream, such as bit rate, format, information about audio or video codecs, or any other technical characteristics of the stream, then for the checked link to the stream, the start time of the next check is set equal to the current th time. Setting the start time of the next link check to a web page equal to the current time causes the link to be checked immediately.
  • Schedule Control Module 8 changes the significance factor, period and time of the next check of the link to web pages. If the multimedia stream is on, then the significance coefficient of the link to the web page increases by one; if the stream was turned on and now it is off, the significance coefficient decreases by one. For example, for one link to a web page there may be more than one “working” stream, if there are three, then the coefficient of significance will accordingly be equal to three. In the particular case, the significance coefficient fully corresponds to the number of links to real-time multimedia streams located on a web page.
  • the conditions for determining the rules for choosing a significance factor for checking links on a web page may not depend on the number of links to real-time multimedia streams located at the specified address of the web page and, as a result, can be determined on the basis of other conditions. If the checked link to the web page has, for example, two links to real-time multimedia streams, and both links to the streams have state changes, for example, the streams have stopped working (are off), in this case, the link check value is per web page will be zero, which will exclude the page from scanning. Based on the significance coefficient, the period for checking the link to the web page is calculated, and the sequence of loading the web pages in the loading module 2, for which the time has come to check, is established.
  • the rule for calculating the validation period may vary, depending on the events that are the reason for the double-checking of the web page.
  • the proposed system can be implemented on one or more server computers, combined to jointly implement the prescribed functionality, while the above modules can be implemented in software and hardware components of these server computers, known to specialists and widely used in technology.
  • the above databases can be implemented on one or more commonly known computer-readable media, for example, hard disk drives, RAID arrays, solid state memory, etc.
  • the data download module can be connected and can interact with the Internet based on well-known wired and / or wireless network technologies and equipment, in particular, based on the protocol stack http / tcp / ip.
  • the operator can use any known terminal equipment that supports the ability to execute commands of the database interaction language (for example, SQL).
  • Such equipment may be, for example, a suitably configured personal / laptop / handheld computer.
  • the following are specific examples of the schedule for rechecking links to web pages with real-time multimedia streams placed on them.
  • the first example shows the appearance of new links on a web page in the schedule database 9.
  • the search engine found a new web page using the link l link, after analyzing which it was revealed that the page has two links to the multimedia streams Stream l and Stream_2.
  • Data management module 3 the information about the web page is transmitted, which contains various meta-information about the page itself and the detected links to multimedia streams (see Fig. 3).
  • the data management module sends links to the stream to the database of streams 4, where along with it it requests status for the transferred streams. If this link to the multimedia stream has already been transferred to the database of streams 4, then the data management module will receive information about it, if the link to the stream is new, then the stream information will remain unknown until it is verified by the streaming check module 5.
  • Next data management module 3 checks the information about the link Reference l in the relationship database 6 to determine the associative relationship of this link to a web page with links to multimedia streams obtained from a previous check of this page. If this link to a web page is not in the database of relationships (that is, new), then it will be added to it, where the link of this link to the web page and links to multimedia streams will be indicated (see Fig. 3).
  • Information about this link to a web page will remain unchanged in the database of relationships until links to the multimedia streams that were found on it are checked.
  • Database changes interrelations 6 and further work with this link to the web page will be carried out only if after checking the links to the streams it will be determined that at least one of them refers to links to multimedia streams in real time.
  • the module for checking streams 5 takes references to the verification from the database of streams 4, after which it determines that one of the links refers to multimedia streams in real time (see Table N ° 1).
  • the thread control module 7 requests information from the thread database 4 about real-time streams that have changes in type, state or characteristics. After the request, the flow control module 7 will receive a link to the stream Stream l, which will indicate that the stream has switched to the type of streams that belong to live streams (see table N ° 2).
  • the flow control module 7 makes a mark in the database of relationships 6 for all links to web pages that have a link to this link to a multimedia stream that this stream has switched to the status of a "live" stream.
  • the schedule management module 8 takes from the relationship database 6 all links to web pages that have changes in the type, state or characteristics of links to real-time streams.
  • Schedule management module 8 will receive a Reference l link for which the number of real-time streams and a mark on changes in the streams will be indicated (see table H).
  • Data loading module 2 receives from the database of schedule 9 (see table N ° 5) three links to web pages for which it is time to start the scan.
  • the reference_5 web page contains two links to real-time streams, while Reference_6 has a common stream with it, and the web page link Reference_7 contains an independent link to a stream that has no intersections with other web pages.
  • Data Download Module 2 downloads web pages from the specified links. After analyzing the content, links to streams that have the connection shown in FIG. 5 were found in the downloaded documents.
  • the found data is transmitted to the data management module 3.
  • the data management module sends the found links to the streams to the database of streams 4 in order to save new links to the streams and obtain information for already known streams (see Fig. 6).
  • the received information about streams from the stream database 4 indicates that the links to the stream Stream_10 and Stream_50 are checked and are real-time streams, the link Stream ll pointed to the real-time stream, is in the off state, a Stream_51 is a new link to a multimedia stream and requires verification by the stream verification module 5.
  • the data management module starts checking the previous association for the data of links to web pages in the relationship database 6. Based on the check, it is determined that some web pages have changed links to multimedia streams, which leads to changes in the associations in the relationship database 6 for these links to web pages (see FIG. 7).
  • the data management module notes that the web page at Reference_7 no longer contains a link to the Stream_12 stream and assigns new links to it, indicating that it contains Stream_50 and Stream_51 streams. Along with the changes in the associative relations, information on streams, which was obtained from the stream database 4. It is noted that the status of the stream ll link has changed, where it is indicated that the stream is turned off, and the links to Stream lO and Stream_50 streams are operational and are real-time multimedia streams. Since Stream_51 is a new link to a stream, there is no information for it that can affect changes in the schedule 9.
  • Schedule management module 8 requests links to web pages that have changes in real-time streams from the relationship database 6 (see table N ° 7)
  • the schedule management module makes changes to the schedule 9 database (see table N ° 8).
  • Reference_5 the significance coefficient decreases and, as a result, the period increases.
  • Reference_6 is excluded from the scan because it does not currently have real-time streams working.
  • Reference_7 There are currently no changes for Reference_7, since before the scheduled check this link to a web page had an associative connection with Stream_12, after checking it began to point to two streams, but the type of stream was known only for one link to the stream, the second link to the stream was new, therefore the significance coefficient for this link currently remains equal to 1.
  • Data loading module 2 revealed a new link to the Stream_51 stream for which the type was not determined, after it was checked by the thread verification module 5, Roedel that the link points to a live stream.
  • the flow control module 7 requests data from the database of streams 4 and receives information (see table N ° 9) that Stream_51 refers to multimedia links to real-time streams.
  • the flow control module makes changes to the relationship database 6, where for all links to web pages that have a link to Stream 51, a mark is made about the change in the type of stream.
  • Module Schedule Management 8 again requests from the relationship database 6 information about links to web pages that have changes in the type, state or characteristics of real-time streams and receives data (see table ° 10) about the changes for Reference_7.
  • the significance coefficient, the verification period and the time of the next verification start are calculated, after which changes are made to the database of schedule 9 (see table ⁇ ° 11).
  • Reference_7 Since Reference_7 now has 2 links to real-time streams, accordingly, it increases the significance coefficient and the recheck period changes, which leads to a change for the next time that the link to the web page is checked.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne un procédé pour vérifier des pages web comprenant des flux multimédia en temps réel, lequel consiste à charger des pages web afin de les vérifier selon une planification, analyser les pages web chargées afin de détecter la présence dans celles-ci de renvois à des flux multimédia puis sauvegarder les renvois trouvés dans une base de données de flux ; les informations sur l'appartenance du flux multimédia à la page web sont enregistrées dans une base de données de liens réciproques. On effectue ensuite une vérification des données de flux afin de déterminer leur type, si le flux multimédia est un flux en temps réel et afin de révéler les changements de type de flux et/ou d'état de flux et/ou de caractéristiques de flux, après quoi on sauvegarde les informations sur les modifications dans la base de données de flux. Dans la base de données de liens réciproques, on sépare une marque portant sur la nature des changements, en ajoute dans la planification de nouveaux renvois vers les pages web dans lesquelles ont été découverts des renvois vers des flux multimédia en temps réel et/ou on modifie la période de vérification pour les renvois vers les pages web existant dans la planification pour lesquelles on a découvert des modification de flux et/ou on modifie le moment de démarrage de la vérification suivante. Le système mis en œuvre sur ordinateur comprend des modules et des bases de données qui reproduisent l'algorithme du procédé.
PCT/RU2013/001055 2013-07-26 2013-11-25 Procédé pour vérifier des pages web comprenant des flux multimédia en temps réel, et système mis en œuvre sur ordinateur pour réaliser le procédé WO2015012720A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2013134965 2013-07-26
RU2013134965/08A RU2530672C1 (ru) 2013-07-26 2013-07-26 Способ проверки веб-страниц на наличие в них мультимедийных потоков реального времени и компьютерно-реализуемая система для осуществления способа

Publications (1)

Publication Number Publication Date
WO2015012720A1 true WO2015012720A1 (fr) 2015-01-29

Family

ID=52393617

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2013/001055 WO2015012720A1 (fr) 2013-07-26 2013-11-25 Procédé pour vérifier des pages web comprenant des flux multimédia en temps réel, et système mis en œuvre sur ordinateur pour réaliser le procédé

Country Status (2)

Country Link
RU (1) RU2530672C1 (fr)
WO (1) WO2015012720A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689377A (zh) * 2019-09-30 2020-01-14 北京达佳互联信息技术有限公司 一种数据检测方法、装置及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2399090C2 (ru) * 2008-12-03 2010-09-10 Общество С Ограниченной Ответственностью "Мералабс" Система и способ для интернет-поиска мультимедийного контента реального времени
US7886042B2 (en) * 2006-12-19 2011-02-08 Yahoo! Inc. Dynamically constrained, forward scheduling over uncertain workloads
US8386459B1 (en) * 2005-04-25 2013-02-26 Google Inc. Scheduling a recrawl

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8386459B1 (en) * 2005-04-25 2013-02-26 Google Inc. Scheduling a recrawl
US7886042B2 (en) * 2006-12-19 2011-02-08 Yahoo! Inc. Dynamically constrained, forward scheduling over uncertain workloads
RU2399090C2 (ru) * 2008-12-03 2010-09-10 Общество С Ограниченной Ответственностью "Мералабс" Система и способ для интернет-поиска мультимедийного контента реального времени

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689377A (zh) * 2019-09-30 2020-01-14 北京达佳互联信息技术有限公司 一种数据检测方法、装置及电子设备
CN110689377B (zh) * 2019-09-30 2023-04-18 北京达佳互联信息技术有限公司 一种数据检测方法、装置及电子设备

Also Published As

Publication number Publication date
RU2530672C1 (ru) 2014-10-10

Similar Documents

Publication Publication Date Title
US9594826B2 (en) Co-selected image classification
US10572565B2 (en) User behavior models based on source domain
US10277696B2 (en) Method and system for processing data used by creative users to create media content
US7966341B2 (en) Estimating the date relevance of a query from query logs
US20090043749A1 (en) Extracting query intent from query logs
US20090271391A1 (en) Method and apparatus for rating user generated content in seach results
US20090216741A1 (en) Prioritizing media assets for publication
US10839013B1 (en) Generating a graphical representation of relationships among a set of articles and information associated with the set of articles
US20100042615A1 (en) Systems and methods for aggregating content on a user-content driven website
US20170155939A1 (en) Method and System for Processing Data Used By Creative Users to Create Media Content
US10691664B1 (en) User interface structural clustering and analysis
CN113609374A (zh) 基于内容推送的数据处理方法、装置、设备及存储介质
US11108717B1 (en) Trends in a messaging platform
RU2399090C2 (ru) Система и способ для интернет-поиска мультимедийного контента реального времени
US20200151227A1 (en) Computing system with dynamic web page feature
CN112035534A (zh) 一种实时大数据处理方法、装置及电子设备
US20090006354A1 (en) System and method for knowledge based search system
CN111104583B (zh) 一种直播间推荐方法、存储介质、电子设备及系统
US8935285B2 (en) Searchable and size-constrained local log repositories for tracking visitors' access to web content
RU2530671C1 (ru) Способ проверки веб-страниц на содержание в них целевого аудио и/или видео (av) контента реального времени
RU2530672C1 (ru) Способ проверки веб-страниц на наличие в них мультимедийных потоков реального времени и компьютерно-реализуемая система для осуществления способа
US20230107935A1 (en) User interfaces for refining video group packages
CN106156024B (zh) 一种信息处理方法及服务器
CN111970327A (zh) 一种基于大数据处理的新闻传播方法及系统
TW202011231A (zh) 資料分析方法及資料分析系統

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13890248

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13890248

Country of ref document: EP

Kind code of ref document: A1