WO2019166858A1

WO2019166858A1 - File hosting service in cloud

Info

Publication number: WO2019166858A1
Application number: PCT/IB2018/051301
Authority: WO
Inventors: Pratik Sharma
Original assignee: Pratik Sharma
Priority date: 2018-03-01
Filing date: 2018-03-01
Publication date: 2019-09-06

Abstract

Here we provide a file hosting service in the cloud where for different users we provide unique identification and password and users can upload media content like video, audio, static web pages or files, documents or presentations, etc. for sharing with other users who can play back media content or retrieve the documents on their devices. Users upload the above mentioned various kinds of files along with their title, short description and any authors or owners if applicable, etc. Different users after authentication by providing the unique file name can view, comment or like different files. Users can also do a search query with a keyword or a long phrase or sentence for retrieving file or files instead of providing file name.

Description

File Hosting Service in Cloud

In this invention we provide a file hosting service in the cloud where for different users we provide unique identification and password and users can upload media content like video, audio, static web pages or files, documents or presentations, etc. for sharing with other users who can play back media content or retrieve the documents on their devices. Users upload the above mentioned various kinds of files along with their title, short description and any authors or owners if applicable, etc. Different users after authentication by providing the unique file name (files when uploaded are stored in cloud with unique file name given by the user) can view, comment or like different files and we will maintain per file list of users who have viewed or viewing the document, list of users who have comments along with their comments either in alphabetical order of user identification or the chronological order of comments made by different users along with priority with filter available for the same and the list of users who have liked the file. Users can also do a search query with a keyword or a long phrase or sentence for retrieving file or files instead of providing file name. In case of a keyword the file hosting service will perform a full-text search only on title, description, and author or owner as applicable for all files and return the list of relevant files to the user ranked in such a manner that file with highest frequency of keyword is first and the file with lowest frequency of keyword is last. Similarly for long phrases or sentences we do semantic parsing of the phrase or sentence to get a formal representation of its meaning. Here for this case we form a cluster of files with similar semantic context with respect to the title and description of the files. We do the above by performing Natural Language Processing on the title and description of all files by doing identifying important words with syntactic analysis and by applying the Word Sense Disambiguation System for better file classification. We can further visualize the documents or files by representing each file by a vector which specifies how many times each word occurs in the title or description of the file (the word frequencies). These counts are weighted to reflect the importance of each word. The weighting is the inverse of the log of occurrence of the each word in different file’s titles or description (the inverse document frequency). This vector of weighted counts is called a "bag of words"

representation. Words from a specific list of "stop words" (such as function words) are not included in the representation. After this given a set of document vectors (vectors for terms occurring in title or description of the file) we apply the Self Organising Maps algorithm which helps in finding a partitioning of those files into clusters and the range of files in the collection can then be visualized by displaying each cluster's topic at the cluster's position on a 2- dimensional map. The algorithm searches the space of clustering and the space of position assignments simultaneously, trying to find a global optimum for two criteria. The first criterion is that the titles or descriptions of the files within a given cluster are similar to each other. This property means that each cluster has a coherent topic. The second criterion is that clusters which have positions next to each other on the map (called "neighbours") have similar the titles or descriptions for all its files to the neighbouring cluster. This property means that the topics of clusters change continuously as one moves across the map, making it easier for a viewer to understand the range of files in the collection than would be possible with an unstructured list of topics. We return the files from a cluster to the user for the long phrase or sentence query and rank them such that the file in the cluster with highest number of semantically related relevant terms is first and the file with lowest number of semantically related relevant terms is last. Finally we do transcoding of the media file or any file from source format into versions that will play back or rendered on user devices like smartphones, tablets, personal computers, etc. Typically for each file if space is not a constraint we will store the transcoded format for rendering purposes on various user devices so that we do not have to transcode at run time.

Claims

File Hosting Service in Cloud Following is the claim for this invention: -

1. In this invention we provide a file hosting service in the cloud where for different users we provide unique identification and password and users can upload media content like video, audio, static web pages or files, documents or presentations, etc. for sharing with other users who can play back media content or retrieve the documents on their devices. Users upload the above mentioned various kinds of files along with their title, short description and any authors or owners if applicable, etc. Different users after authentication by providing the unique file name (files when uploaded are stored in cloud with unique file name given by the user) can view, comment or like different files and we will maintain per file list of users who have viewed or viewing the document, list of users who have comments along with their comments either in alphabetical order of user identification or the chronological order of comments made by different users along with priority with filter available for the same and the list of users who have liked the file. Users can also do a search query with a keyword or a long phrase or sentence for retrieving file or files instead of providing file name. In case of a keyword the file hosting service will perform a full-text search only on title, description, and author or owner as applicable for all files and return the list of relevant files to the user ranked in such a manner that file with highest frequency of keyword is first and the file with lowest frequency of keyword is last. Similarly for long phrases or sentences we do semantic parsing of the phrase or sentence to get a formal representation of its meaning. Here for this case we form a cluster of files with similar semantic context with respect to the title and description of the files. We do the above by performing Natural Language Processing on the title and description of all files by doing identifying important words with syntactic analysis and by applying the Word Sense Disambiguation System for better file

classification. We can further visualize the documents or files by representing each file by a vector which specifies how many times each word occurs in the title or description of the file (the word frequencies). These counts are weighted to reflect the importance of each word. The weighting is the inverse of the log of occurrence of the each word in different file’s titles or description (the inverse document frequency). This vector of weighted counts is called a "bag of words" representation. Words from a specific list of "stop words" (such as function words) are not included in the representation. After this given a set of document vectors (vectors for terms occurring in title or description of the file) we apply the Self Organising Maps algorithm which helps in finding a partitioning of those files into clusters and the range of files in the collection can then be visualized by displaying each cluster's topic at the cluster's position on a 2-dimensional map. The algorithm searches the space of clustering and the space of position assignments simultaneously, trying to find a global optimum for two criteria. The first criterion is that the titles or descriptions of the files within a given cluster are similar to each other. This property means that each cluster has a coherent topic. The second criterion is that clusters which have positions next to each other on the map (called "neighbours") have similar the titles or

descriptions for all its files to the neighbouring cluster. This property means that the topics of clusters change continuously as one moves across the map, making it easier for a viewer to understand the range of files in the collection than would be possible with an unstructured list of topics. We return the files from a cluster to the user for the long phrase or sentence query and rank them such that the file in the cluster with highest number of semantically related relevant terms is first and the file with lowest number of semantically related relevant terms is last. Finally we do transcoding of the media file or any file from source format into versions that will play back or rendered on user devices like smartphones, tablets, personal computers, etc. Typically for each file if space is not a constraint we will store the transcoded format for rendering purposes on various user devices so that we do not have to transcode at run time. The above novel technique of providing file hosting services in the cloud is the claim for this invention.