WO2018039747A1

WO2018039747A1 - Method and system for intimate message and response video composing

Info

Publication number: WO2018039747A1
Application number: PCT/AU2017/050955
Authority: WO
Inventors: Jacob Gough; Arienne Sellers; Darran Franks; Paul DE LANGE; Arash Daneshvar; Josh Lapham; William Rapert; Grady Vincent
Original assignee: Kwickie International Ltd
Priority date: 2016-09-02
Filing date: 2017-09-04
Publication date: 2018-03-08

Abstract

The present invention relates generally to the field of video messaging and particularly to system allowing creation of an electronic message emulating a synchronous fan video message and second person video reply, by compositing a video message and video reply captured at different times (asynchronously) in a manner that balances processing and file size between the different hardware components of the system.

Description

METHOD AND SYSTEM FOR INTIMATE MESSAGE AND RESPONSE VIDEO

COMPOSING TECHNICAL FIELD

[0001] The present invention relates generally to the field of video messaging and particularly to emulating a synchronous fan video message and second person video reply, by compositing a video message and video reply captured at different times (asynchronously).

BACKGROUND ART

[0002] A previous patent application provided a method and system for the recording of separate asynchronous video conversations and compositing them into one video, to simulate a real-time, synchronous conversation.

[0003] This raised a specific problem of how to pre-process videos from a wide range of mobile devices, how to compose (or stitch) those videos together visually while maintaining participant intimacy and viewer engagement, how to coerce the audio into the highest clarity as possible and then how to store and prepare the files for distribution. Achieving all of these elements in a time frame that is acceptable to users while remaining cost effective is not trivial.

[0004] It will be clearly understood that, if a prior art publication is referred to herein, this reference does not constitute an admission that the publication forms part of the common general knowledge in the art in Australia or in any other country.

SUMMARY OF INVENTION

[0005] The present invention is directed to method and system for intimate message and response video composing, which may at least partially overcome at least one of the

abovementioned disadvantages or provide the consumer with a useful or commercial choice.

[0006] With the foregoing in view, the present invention in one form, resides broadly in a method for intimate message and response video composing, the method including the steps of a) a first user capturing a first person video message having an audio component and a

visual component using a first person personal computing device,

b) applying at least noise suppression and gain control to the audio component of the first person video message on the first person personal computing device

c) a second user capturing a second person video message having an audio component and a visual component using a second person personal computing device

d) applying at least echo cancellation, noise suppression and gain control to the audio

component of the second person video message on the second person personal computing device

e) normalising the visual component of both the first person video message and the second person video message to substantially conform in size, aspect ratio and format f) normalising the audio component of both the first person video message and the second person video message to substantially conform in audio quality and format

g) normalising an audio level between the respective audio components of the first person video message and the second person video message; and

h) output of a concatenated video master file of a composed message including the

normalised audio components and normalised video components of both the first person the message and the second person video message to emulate a synchronous video conversation from asynchronously captured video messages.

[0007] In another form, the invention resides in a system for intimate message and response video composing, the system including a) At least one computer server or computer network operating a primary software

application

b) a first person personal computing device having an audio capture device, a video capture device and data transmission capability and a operating a second software application used to capture a first person video message having an audio component and a visual component, and to apply at least noise suppression and gain control to the audio component of the first person video message on the first person personal computing device to create a pre-processed first person video message and uploading the pre- processed first person video message to the at least one computer server or computer network

c) a second person personal computing device having an audio capture device, a video

capture device and data transmission capability and a operating a second software application used to capture a second person video message having an audio component and a visual component, and to apply at least echo cancellation, noise suppression and gain control to the audio component of the second person video message on the second person personal computing device to create a pre-processed second person video message and uploading the pre-processed first person video message to the at least one computer server or computer network d) the at least one computer server or computer network normalising the visual component of both the pre-processed first person video message and the pre-processed second person video message to substantially conform in size, aspect ratio and format, normalising the audio component of both the pre-processed first person video message and the pre- processed second person video message to substantially conform in audio quality and format, normalising an audio level between the respective audio components of the normalised first person video message and the normalised second person video message; and

e) output of a concatenated video master file of a composed message including the

[0008] In another form, the invention resides in a system for intimate message and response video composing, the system including a first person personal computing device having an audio capture device, a video capture device and data transmission capability and a operating a second software application used to capture a first person video message having an audio component and a visual component, and to apply at least noise suppression and gain control to the audio component of the first person video message on the first person personal computing device to create a pre-processed first person video message and uploading the pre-processed first person video message to at least one computer server or computer network.

[0009] In another form, the invention resides in a system for intimate message and response video composing, the system including a second person personal computing device having an audio capture device, a video capture device and data transmission capability and a operating a second software application used to capture a second person video message having an audio component and a visual component, and to apply at least echo cancellation, noise suppression and gain control to the audio component of the second person video message on the second person personal computing device to create a pre-processed second person video message and uploading the pre-processed first person video message to at least one computer server or computer network.

[0010] In another form, the invention resides in a system for intimate message and response video composing, the system including the at least one computer server or computer network receiving at least one first person video message having an audio component and a visual component, receiving at least one second person video message having an audio component and a visual component, normalising the visual component of both the first person video message and the second person video message to substantially conform in size, aspect ratio and format, normalising the audio component of both the first person video message and the second person video message to substantially conform in audio quality and format, normalising an audio level between the respective audio components of the normalised first person video message and the normalised second person video message; and output of a concatenated video master file of a composed message including the normalised audio components and normalised video

components of both the first person the message and the second person video message to emulate a synchronous video conversation from asynchronously captured video messages.

[0011] The system of the present invention will preferably apply at least echo cancellation, noise suppression and gain control to the audio component of the second person video message on the second person personal computing device to create a pre-processed second person video message and uploading the pre-processed first person video message to the at least one computer server or computer network and normalise the audio and visual components of any message or message portion captured. For example, the system may be used to capture a reaction recording of a second person utilising the second person personal computing device which may be actuated automatically upon the second user choosing to view a first person video message, in order to capture a real-time reaction. This reaction recording may be concatenated with the first person the message and the second person video message into the concatenated video master file once the steps above have been undertaken.

[0012] Being able to clearly hear what is being said and understanding visually who is in focus in a video message including interaction between more than one person, is important for viewer engagement. At the same time, the emulated synchronous experience should be maintained. The "intimacy" of the connection formed between the two parties of the separate asynchronous video conversations is important to customer satisfaction and at the core of that intimacy, is the capture of the separate asynchronous video conversations, pre-processing the audio component of each to ensure that the audio components are of suitable quality, normalising the visual component of each to conform in size, aspect ratio, and format and then normalising audio level between both before outputting the polished audio and video components of each to a master video composed message file.

[0013] The present invention provides a method and system for the recording of separate asynchronous video messages and compositing them into one video, which simulates or emulates a real-time, synchronous conversation. In a most preferred embodiment, the method and system of the present invention will be used to allow a fan to interact with an influencer. The influencer can be a sports star or personality, an entertainment personality or any type of person with an image that may appeal to a fan.

[0014] The method and system of the present invention is based about a first person (for example, a fan) utilising a first person personal computing device such as a smartphone, computer, tablet computer or any other first person personal computing device to capture an audio/video message and then send that audio/video message to a second person (for example, an influencer, generally a celebrity) who has a second person personal computing device such as a smartphone, computer, tablet computer or any other second person personal computing device for playback, preferably record the second person's reaction to the first person's audio/video message, allow the second person to record an audio/video message in response thereto and then composite the messages and reaction into a single audio/video message.

[0015] Preferably, access to the system for both the first person and the second person will be via their own personal computing device.

[0016] The system will normally include a primary software application residing on a server or similar and a secondary software application operating on the personal computing device of each user. The secondary software application may be the same application regardless of whether the user is a first person or a second person and the functionality and use of the software application determined by the category of the user which is preferably determined at login according to the user's unique login information. As mentioned above, the respective first person and second person personal computing devices can be any type however, will typically be a smart phone, computer tablet or other portable device having at least one communication pathway in order to communicate with the computer server or computer network operating a primary software application.

[0017] The personal computing device preferred for use in the present invention includes a processor with on-board memory, a display, at least one input apparatus, at least one output apparatus (such as audio output, directly via speakers or similar or indirectly via a port or similar allowing the connection of speakers, visual output or similar), at least one image capture device such as a camera, at least one audio capture device such as a microphone and access to at least one communication pathway to transmit data between system components. Normally, the display will preferably be a touchscreen as many personal computing devices currently available have this feature. The advantages of the touchscreen include allowing a larger display and also allowing the display to function as a part of or as, the input apparatus. The display will also function as a video playback device. These types of devices also usually have at least one camera and at least one microphone which will allow video capture and audio capture.

[0018] Preferably, the software operating on the hardware of the system of the present invention includes a primary software application operating on the computer server or computer network. Preferably, a secondary software component is provided at the personal computing device level on each of the user personal computing devices in order to interact with the primary software application.

[0019] The primary software application is preferably the "engine" for compositing the captured video messages, responsible for receiving the various messages and recordings created or captured by a user (fan and/or influencer) and to composite these into a single message for delivery to a user. Some pre-processing steps are preferably undertaken by the secondary software component provided at the personal computing device level on each of the first user and second user personal computing devices.

[0020] The system for compositing or concatenating asynchronously captured video messages of the present invention preferably includes a secondary software application designed to operate on smartphones, tablet computers and another mobile device that each customer and consumer will require in order to access the Internet data transmission. The secondary software application will preferably be available through an application distribution platform, which is typically operated by the owner of the mobile operating system, such as the Apple App Store, Google Play, Windows Phone Store and BlackBerry App World. The secondary software application of the present invention will normally be downloaded from the application distribution platform to a target personal computing device.

[0021] Preferably, the secondary software application is provided to operate on a personal computing device with appropriate connections through the personal computing device to the computer server or computer network operating a primary software application in order to gain additional information to that present on the personal computing device. The additional information may be obtained from the computer server or computer network and/or by push notification from the computer server or computer network to the personal computing device and/or upon request from the personal computing device.

[0022] The secondary software application will preferably allow communication with the primary software application operating on the computer server or computer network. Preferably, the primary software program operating on the computer server or computer network will be more advanced and be responsible for the bulk of the processing with the secondary software application operating on the smartphones, tablet computers and another mobile device typically smaller and with less processing power, optimised to send and receive instructions and requests and leaves the operations requiring larger processing power to the primary software program operating on the computer server or computer network.

[0023] According to a preferred embodiment, the present invention will preferably have a number of parties associated with the system, with a party categorised into one or more general types. The preferred types of parties associated with the system include a system administrator (which can be one or more people, and/or machines in one or more locations) primarily responsible for maintenance of the system and particularly the computer server or computer network and/or primary software application, fans who ask questions and create messages and influencers who are people with which the fans wish to interact and/or be associated.

[0024] Users will normally download the secondary software application to their personal computing device. The download of the secondary software application will normally include appropriate instructions to be stored in the memory of the respective personal computing device in order to create and maintain links and associations with the computer server or computer network in order to communicate with one or more databases stored thereon.

[0025] The respective personal computing device normally provides access to one or more communications pathways in order to communicate with the computer server or computer network in order to access the system. Normally, the computer server or computer network will include one or more databases containing information about the users such that information regarding the identity of any one or more of these parties may be communicated by the respective personal computing device or the software application to ensure that the respective personal computing device requesting data from the computer server or computer network or to which data is to be sent or from which information is received, is a personal computing device of an authorised user of the system. This functionality is normally accomplished through a login facility in which the user uses a personal computing device to log into the system.

[0026] Other types of input apparatus are typically also present including at least one voice input apparatus, typically a microphone or similar device or a biometric device could be used.

[0027] The method of the present invention is preferably achieved by computer hardware operating software containing instructions in association with one or more communications pathways between a variety of pieces of computer hardware operating software compliant with the system, in order to achieve the method.

[0028] The computer hardware included in the system of the present invention typically includes a computer server or computer network operating the primary software application which is operated or maintained by a system administrator and which electronically stores information in relation to the users of the system and also receives data, forms the composite messages and dispatched the composite messages. The hardware also preferably includes or has access to a communication network in order to send/receive requests from users to and from the computer server or computer network.

[0029] As mentioned above, the hardware included in the system of the present invention also includes a personal computing device for each user. The respective personal computing devices will preferably be the primary points of access to the system of the present invention by the users of the system and normally interaction with the primary software application operating on the computer server or computer network will occur using the personal computing devices.

[0030] The hardware included in the system of the present invention preferably includes a computer server and one or more personal computing devices, each with access to a

communication network. As mentioned above, the personal computing devices will typically be a smart phone, tablet or other computer.

[0031] The computer server or computer network will normally include a processor with memory operating instructions and a number of databases stored in electronic form. The databases will typically include at least one user database containing a unique user profile for each user of the system and at least one database of messages and/or recordings may be maintained separately from the at least one user database or alternatively, the messages and/or recordings may be stored in the respective user profiles. It is anticipated that the at least one user database can be provided as a single database, with the designation of a user as being either a fan or influencer (or both) dependent upon the use of the system.

[0032] The system of capturing the video messages will normally be implemented through instructions which when followed, generate one or more interfaces on a personal computing device. The instructions will normally be sent from the primary software application on the computer server or computer network to a user's personal computing device and which will then be followed in order to generate an interface in real time and update the interface according to the user's interaction with the system.

[0033] Many of these personal computing devices have touchscreens for display allowing the user to directly interact with the touch screen in order to interact with the interface. However, a normal non-touchscreen display can be used with a movable pointer or selection tool in order to allow a user to interact with the interface. One or more "buttons" are provided on the interface to allow the user to interact with the personal computing device and through the personal computing device, to interact with the system.

[0034] The generated interface will typically be updated substantially in real time according to the rules or instructions which are issued by the primary software application operating on the computer server or computer network and the at least one user database. The generated interface will also typically be updated substantially in real time according to interactions by the user(s) with the system.

[0035] In use, and from the user point of view, the system will normally include a selection interface will preferably allow a fan to drill down into different areas in order to identify particular influencers. Once the fan has identified one or more influencers to follow, the fan can then preferably "follow" the influencer by tapping an action button which will trigger the addition of that influencer to the fan's profile. Normally, after the setup stage, every time the fan logs into the system, the fan can select from a stored list of influencers that they are following in order to undertake further action.

[0036] Selection of a particular influencer will typically trigger generation and display of an influencer profile interface. The influencer profile will typically be constructed in a manner similar to the fan profile and once created, the fan can view the influencer profile by selection of the influencer from a list. The influencer profile will typically include an image of the influencer, together with information relating to the influencer such as demographic information or statistics and the profile will particularly preferably include a newsfeed or update list of current or historical news in relation to the particular influencer.

[0037] The fan will also preferably have an action button provided on either the influencer profile interface or directly from the influencer selection interface that will allow the fan to create a fan message. Typically, the action button will allow the fan to begin recording a fan message. The action button will typically be known as a shutter button in some preferred embodiments. Other setup buttons may be provided on the interface or on a subsequent interface allowing the fan to set up the recording. The fan will typically be allowed to video is recorded or a still images recorded together with audio and also be provided with a flip camera button to activate either the front or rear camera on the personal computing device as required. [0038] Activation of the shutter button by the fan will typically cause the secondary software application operating on the personal computing device to begin recording audio and preferably video via the hardware provided on the personal computing device in order to capture the fan message. Preferably, the fan message will be limited to a particular time limit such as for example 10 seconds in length, 15 seconds in length, 25 seconds in length or 30 seconds in length. Although the fan message will normally be limited, a 30 second length limit is preferred.

Activation of the shutter button again will typically pause and preferably stop the recording.

[0039] At this stage, the fan may be able to review the fan message that has been captured and can choose to either trash the fan message and record another.

[0040] Preferably, the secondary software application will generate and display a simple thumbnail selection process allowing a first person to select a thumbnail image to accompany their video message as an identifier to be presented to the second person. Once the recording of the first person video message is completed, the first user is preferably presented with a number of thumbnail still images extracted from the recorded sequence of the video component of the first person video message. The first user can preferably tap any of the thumbnail images to select and present the chosen thumbnail image that appears next to the first person video message when presented to the second person. This allows the first user to select the most interesting or appealing thumbnail image to display to the second person upon receiving the message.

[0041] Another pre-processing step that is preferably implements by the secondary software application operating on the first person personal computing device is cropping of recorded video messages to a rectangular and preferably square format. This is typically achieved by cropping about a personal computer device specific centre aided by a circular viewfinder to assist with a meaningful crop. This in turn allows a reduction in the video file size and optimising transfer time of the data relating to the pre-processed video component of the first person video message. Preferably device specific centre is set by the operating software of the personal computer device. Most personal computer devices have operating software that allows the device to detect the camera's orientation and/or the centre of the camera recording zone using the hardware of the personal computer device. The system of the present invention will therefore preferably utilise the personal computer device's own operating software to locate the centre of the recorded visual component.

[0042] A feature-detection (or facial recognition) algorithm may be performed over each frame in the recorded visual component to determine the orientation and/or position of the person's face. The aim of feature detection when used in this way is preferably two-fold: (1) locate a plane using the eyes as reference, i.e. a straight line can be drawn between pupil centers, and extended out to the edge of the recorded visual component or a viewfinder overlay applied to the recorded visual component, and (2) locate a plane where the shoulders of the user meets the edge of the recorded visual component or a viewfinder overlay applied to the recorded visual component, i.e. connect the points where shoulders meet the edges using a straight line.

[0043] This may allow the secondary software application to center the user's face in the crop zone and crop around that center.

[0044] Once ready, the first user can send/submit the fan message.

[0045] As a first person is normally not going to be playing sound while the device's microphone is open, echo will normally not be a problem but acoustic echo cancellation may be used. If used, it is generally used before a second technique known as noise suppression is preferably applied. If acoustic echo cancellation, is not applied, noise suppression is applied as a first per-processing step. Unfortunately noise suppression is usually destructive to the audio component captured, in other words, it removes some of the intended sound as well as the background noise. To recover the speech component which is important in the context of the present invention, a technique called acoustic gain control is preferably applied. This enhances the audio back to pre- noise suppression levels of voice-like sounds and makes the audio component more clearly audible.

[0046] The audio pre-processing preferably occurs on the personal computing device used to capture the video message. This will preferably optimise the audio component of the captured video message for upload to the primary software application operating on the computer network or server.

[0047] Most personal computing devices used within the system of the present invention capture or record the visual component of the video message in full width (720p/HD or

480p/SD). As a preferred visual pre-processing step, the captured visual component of the video message is preferably cropped to approximately half width around a personal computing device specific center. This visual component pre-processing preferably occurs on the personal computing device used to capture the video message.

[0048] Once the fan has chosen to send/submit the fan message, the secondary software application operating on the personal computing device will typically forward the fan message to the primary software application operating on the computer server or computer network via the available data transmission pathways. If required, the message can be compressed in size prior to sending and then decompressed by the primary software application.

[0049] Once the primary software application operating on the computer server or computer network receives the fan message, the fan message is then preferably forwarded to the influencer. There may be a vetting stage applied to the fan message to ensure that inappropriate fan messages are not forwarded to the influencer. This vetting stage may be accomplished automatically by a part of the primary software application using image recognition software to identify inappropriate images and/or voice or word recognition used to recognise inappropriate audio.

[0050] When a fan has already set up their profile and logs into the system at subsequent times, the fan will normally be presented with a homepage interface. The homepage interface will typically include an operations bar or buttons, normally at an upper or lower portion and the operations bar will normally include a home button, search button, a record button, access to the storage facility having the stored messages and replies and a profile button allowing the fan to edit their profile. A generic homepage will also typically include action buttons allowing the fan to ask a question or create a fan message and/or answer a question or fan message.

[0051] A similar process to that described above will typically be followed when an influencer logs into the system for the first time. As an influencer using the system, the influencer will typically open the secondary software application operating on their personal computing device by tapping the application icon or tile. The first time an influencer uses the application, the influencer will normally be presented with a signup interface generated and displayed on the display of the personal computing device. The signup interface will prompt the creation of an influencer profile including entry of salient information such as the influencer' s name, email address, preferred password and a picture or image to be used as the profile image.

[0052] The influencer profile may include other information such as gender, date of birth, address, preferences and/or interests although this information is optional and may be added at a later time into the influencer profile. The influencer profile is preferably stored in a user profile in association with the primary software application operating on the computer server or computer network. The system administrator may undertake a vetting process when an influencer creates a new profile or updates their profile to ensure that the information added into the profile is not scandalous or contrary to law in any way and/or that the information added complies with the information required by the system. [0053] Entry of information into the secondary software application is preferably using a virtual keyboard which is produced and displayed on the interface, normally as an overlay and/or uploaded using the personal computing device, particularly, using the image capture software present on the personal computing device and/or the audio capture software. The information will normally be entered into one or more entry fields provided on the interface and there will normally be one or more action buttons on the interface to allow entry of information and/or movement about the interface.

[0054] Once the influencer has set up their user profile, there will normally be an action button allowing them to continue. The influencer can manually activate the "continue" action button and activation of this button will normally trigger generation and display of a further interface.

[0055] Influencers will preferably have the ability to set a topic for discussion and edit that topic as required in order to prompt or maintain the interest of fans. The importer may be incentivised in order to maintain the interest of fans and be rewarded or incentivised according to the number of fans requesting interaction with the influencer.

[0056] Importantly, the influencer will be able to view fan messages and answer fan messages with an appropriate action button provided on an interface generated and displayed on the influencer personal computing device. The influencer will typically be given any indication of the number of pending fan messages that are awaiting an answer, normally on a home screen interface. Such an interface may also include a recent activity portion which includes or identifies information relating to the influencer' s recent activity. Where more than one entry occurs on the recent activity portion, the recent activity portion may be movable to advance through the recent activity posts. Normally, this is achieved by sliding or swiping the recent activity portion.

[0057] When the influencer selects the action button allowing the influencer to answer a fan message, the new interface is typically generated and displayed on the personal computing device of the influencer and at the same time, the audio and visual capture devices of the personal computing device are preferably activated so that the influencer can see a real-time image of themselves on the display of their personal computing device and also a preview portion showing previews of the unanswered fan messages.

[0058] Normally, the preview portion includes at least a screenshot "still" from the fan message or the fan's profile picture, preferably the thumbnail chosen by the first user or fan as outlined above. The influencer can move through the pending fan messages by direct manipulation on the display of the personal computing device such as by swiping or sliding for example. Selection of a particular fan message to be answered by the influencer will preferably cause the secondary software application operating on the influencer personal computing device to start capture of video and/or audio as the fan message plays in order to capture the reaction recording in real time. The selection may occur in any way using any motion on the display to initiate the selection and the capture of the reaction recording.

[0059] Preferably, the fan message will preferably display in a different portion of the interface to the influencer image so that the influencer can see the fan message being played as well is having the influencer image captured and played back to the influencer in real time. Preferably, it will be possible for the influencer to pause the play of the fan message and/or stop the play of the fan message. However, in order to capture the most realistic reaction recording, it is preferred that once the fan message has been initiated, the influencer cannot pause or stop the fan message until the end of the fan message.

[0060] Once the fan message has ended, and the reaction recording has taken place, the influencer can then record a response to the fan message. This is typically done through a similar process as the fan recording a fan message as explained above. Once the influencer has recorded their response, the influencer can typically review the response and either dump or trash the response.

[0061] Preferably, the secondary software application will generate and display a simple thumbnail selection process allowing a second person to select a thumbnail image to accompany their video message as an identifier to be presented to the first person. Once the recording of the second person video message is completed, the second person is preferably presented with a number of thumbnail still images extracted from the recorded sequence of the video component of the second person video message. The second user can preferably tap any of the thumbnail images to select and present the chosen thumbnail image that appears next to the second person video message when presented to the first person. This allows the second user to select the most interesting or appealing thumbnail image to display to the first person upon receiving the message.

[0062] Another pre-processing step that is preferably implements by the secondary software application operating on the second person personal computing device is cropping of recorded video messages to a rectangular and preferably square format. In a particularly preferred embodiment, the rectangular format is provided with a circular viewfinder overlay to highlight the features within the circular viewfinder and obscure the features outside the circular viewfinder. This can be done in any way but preferably a circular viewfinder layer is provided about the identified centre of the visual component and then obscuring everything outside the circular viewfinder for example by fuzzing, defocussing, rendering at least partially partially opaque for example by applying either as a darkened-transparent layer (i.e. alpha-channel transparency); or, as a blurring layer.

[0063] This is typically achieved by cropping about a personal computer device specific centre aided by a circular viewfinder to assist with a meaningful crop. This in turn allows a reduction in the video file size and optimising transfer time of the data relating to the pre- processed video component of the second person video message. Preferably device specific centre is set by the operating software of the personal computer device. Most personal computer devices have operating software that allows the device to detect the camera's orientation and/or the centre of the camera recording zone using the hardware of the personal computer device. The system of the present invention will therefore preferably utilise the personal computer device's own operating software to locate the centre of the recorded visual component.

[0064] A feature-detection (or facial recognition) algorithm may be performed over each frame in the recorded visual component to determine the orientation and/or position of the person's face. The aim of feature detection when used in this way is preferably two-fold: (1) locate a plane using the eyes as reference, i.e. a straight line can be drawn between facial features such as pupil centers, and extended out to the edge of the recorded visual component or a viewfinder overlay applied to the recorded visual component, and (2) locate a plane where the shoulders of the user meets the edge of the recorded visual component or a viewfinder overlay applied to the recorded visual component, i.e. connect the points where shoulders meet the edges using a straight line.

[0065] This may allow the secondary software application to center the user's face in the crop zone and crop around that center.

[0066] Once completed, the second user/influencer can send/submit the response.

[0067] The audio component of the second person video message preferably needs to be cleared of echo caused by the second person's personal computing device speaker(s) playing the audio component of the first person video message while the second person's personal computing device microphone is open. Acoustic echo cancellation is preferably used to remove or at least diminish an echo. After applying this cancellation technique to the audio component of a captured second person video message, the echo is preferably removed, but it may still be difficult to hear what the user is saying, particularly if there is a significant level of ambient noise (street, party etc.). Noise suppression is therefore preferably applied. Noise suppression is unfortunately destructive to the audio captured, in other words, it removes some of the intended sound as well as the background noise. To recover the speech component, which is important in the context of the present invention, a third technique called acoustic gain control is typically applied. This enhances the audio close to pre-noise suppression levels and makes the audio more audible.

[0068] The audio pre-processing preferably occurs on the personal computing device used to capture the video message. This will preferably optimise the audio component of the captured video message for upload to the primary software application operating on the computer network or server.

[0069] Most personal computing devices used within the system of the present invention capture or record the visual component of the video message in full width (720p/HD or

480p/SD). As a visual pre-processing step, the captured visual component of the video message is preferably cropped to approximately half width around a personal computing device specific center. This visual component pre-processing preferably occurs on the personal computing device used to capture the video message.

[0070] Once sent or submitted, the influencer response will be conveyed to the primary software application operating on the computer network or computer server.

[0071] After the message & response video are uploaded, the actual composing pipeline starts. In this process, the primary software application deals with the problem of different personal computing devices having different recording profiles (camera resolution, audio performance etc), captured video messages being recorded in a different geographical locations, generating different assets depending on video type and outputting the resulting collateral into a format that can be easily consumed by client devices.

[0072] A key part of the composition is to create a circular concatenated video master file of a composed message is used to create a novel and engaging presentation through animation of the visual representations of first person video message and the second person video message in the concatenated video master file.

[0073] There are two preferred embodiments of animation of the first person video message and the second person video message, namely a first preferred embodiment where only the reaction of the asking party is captured and included in the composited output, and a second preferred embodiment where both reactions of the asking party and responding party are captured, and both are included in the composited output.

[0074] As the video messages from the first person and second person are preferably in circular format, as are the faces that are usually captured, a compact yet comfortable and engaging viewing experience is possible through animation of the first person video message and the second person video message.

[0075] In a preferred embodiment, the visual representations of the video messages will preferably be concatenated to be displayed in different positions relative to one another in the concatenated video master file in order to form an intimate visual connection or relationship between the visual representations of the first person video message and the second person video message with appropriate hierarchical prominence or significance provided for each visual representations dependent upon the position in the playback of the concatenated composed message.

[0076] This will preferably involve positioning the visual representations of the first person video message and the second person video message about the interface and moving the visual representations as required during the playback. The respective positions and movements will normally be applied to each of the visual representations of the first person video message and the second person video message during the composition or concatenation process undertaken by the primary software application.

[0077] Positions such as which of the visual representations of the first person video message and the second person video message is in the foreground, movement of the visual representations of the first person video message and the second person video message into and out of the interface, and overlapping of the visual representations of the first person video message and the second person video message (both in terms of degree of overlap and which overlaps which) can be used.

[0078] According to the first preferred embodiment, when a first person is asking a question and the second person is listening and a reaction recording is captured, the visual representation of the two videos messages can be centrally located and overlapped to imply an intimate connection or relationship between the visual representations of the first person video message and the second person video message, giving some natural emphasis to the video message of the first person when asking the question, for example providing the first person video appearing in the foreground, and slightly over the top of the second person's video containing the reaction to the question. At the end of the first person's video message, the visual representation of the first person video message can move off the interface (for example to one side) and the visual representation of the second person video message can be moved to the centre of the interface.

[0079] According to the second preferred embodiment, the model in the first preferred embodiment may be extended to alternate between the visual representation of the first person video message and the visual representation of the second person video message in the form of a reaction and then the visual representation of the second person video message in the form of a video reply to the first person, and then the visual representation of the first person video message reaction of the first person listening to the second person's video reply message.

Preferably, in this embodiment, the visual representations of the videos messages will preferably be swapped in prominence as the respective video messages are being delivered.

[0080] The animation will preferably be applied by the primary software application once the primary software application has received the normalised audio components and normalised video components of both the first person the message and the second person video message. One or more template interfaces may be provided into which the primary software application locates the normalised video components.

[0081] The primary software application will then composite or concatenate the video message, namely the fan message, the reaction recording and the influencer response into a single concatenated video master file. It is important to realise at this juncture that the reaction recording and the influencer response may be substantially continuously recorded and therefore, although there may be two portions to the message, such a combined or continuously recorded message may be a single message and may be combined with the fan message to form a composite message. The composite message is then normally sent back to at least the fan. The composite message may be also loaded onto the influencer profile and will typically be stored against both the influencer profile and the fan profile.

[0082] The fan and/or the influencer can also share the concatenated video master file with third parties. Preferably, a still shot of the composite message (or a still shot of a portion of the message) may be provided as a part of the sharing of the composite message. Text can preferably be added by either the fan and/or the influencer to the sharing of the composite message.

Normally, an interface will be generated and displayed on the personal computing device of either the fan and/or the influencer allowing the sharing and this interface may provide the ability for the fan and/or the influencer to nominate the mechanism of sharing, for example by designation of social media networks and the like.

[0083] The composite message will typically play the first person message and the second person reaction response at substantially the same time but preferably slightly delay the second person reaction response message in order to simulate a short reaction time between the first person message in the influencer reaction response. This will typically elevate the realism provided by the system of the present invention. Once the first person message and second person reaction response have finished, the second person response will typically play.

[0084] Whilst the above is an outline of what a user sees or experiences when using the system, the present invention will also preferably implement a number of pre-processing steps that normalises the captured videos before passing them on to the primary software application operating on the computer server or network. At that point, several files and pieces of information are dragged in and processed to produce the resulting master video composed message file. This is preferably all performed in a scaleable way. To optimise the speed and quality, a number of actions may be distributed across the clients (audio, cropping etc.) and also using GPUs for the computing over traditional CPU power.

[0085] Any of the features described herein can be combined in any combination with any one or more of the other features described herein within the scope of the invention.

[0086] The reference to any prior art in this specification is not, and should not be taken as an acknowledgement or any form of suggestion that the prior art forms part of the common general knowledge.

BRIEF DESCRIPTION OF DRAWINGS

[0087] Preferred features, embodiments and variations of the invention may be discerned from the following Detailed Description which provides sufficient information for those skilled in the art to perform the invention. The Detailed Description is not to be regarded as limiting the scope of the preceding Summary of the Invention in any way. The Detailed Description will make reference to a number of drawings as follows:

[0088] Figure 1 is a schematic view of a preferred embodiment of the hardware portion of the system for the recording of separate asynchronous video conversations and compositing them into one video, to simulate a real-time, synchronous conversation.

[0089] Figure 2 shows a schematic view of the overall operation of a preferred embodiment of the system of the present invention in forming the composite message.

[0090] Figure 3 is a graphical representation of how the audio phenomenon known as echo occurs.

[0091] Figure 4 is a graphical representation of a personal computing device according to a preferred embodiment of the present invention with an interface generated thereon showing the establishment of a device specific centre for the interface which is centred on the user's face.

[0092] Figure 5 is a graphic illustration of an input audio component and the application of an acoustic echo cancellation filter, a noise suppression filter and an acoustic gain control filter to result in a pre-processed audio component according to a preferred embodiment of the present invention.

[0093] Figure 6 is a flow chart representation of the method according to a particularly preferred embodiment of the present invention.

[0094] Figure 7 is a graphical illustration of an interface of a secondary software application according to a preferred embodiment of the present invention showing the visual representation of the first person video message and the visual representation of the second person video message and allowing a first user to select a thumbnail image.

[0095] Figure 8 is a flow chart representation of a first preferred animation sequence according to which the visual representations of the first person video message and second person the message are displayed in different positions relative to one another in the

concatenated video master file.

[0096] Figure 9 is a flow chart representation of a second preferred animation sequence according to which the visual representations of the first person video message and second person the message are displayed in different positions relative to one another in the

concatenated video master file.

DESCRIPTION OF EMBODIMENTS

[0097] According to a particularly preferred embodiment of the present invention, a system for compositing asynchronous video messages and responses is provided.

[0098] The general hardware implementing the system of the preferred embodiment is illustrated in Figure 1. As illustrated, the system of the preferred embodiment operates using a computer server 10 which interacts with and transfers information to and receive information from the number of personal computing devices of which two types are illustrated, namely a tablet device 11 and a smartphone 12, through a cloud network 13. Information can be transferred to and from the computer server 10 and the personal computing devices 11, 12 in order to implement the system of the preferred embodiment.

[0099] The system of the preferred embodiment includes a computer server 10 operating a primary software application, at least one first person with a personal computing device having an audio capture device, a video capture device and data transmission capability and operating a secondary software application to create an electronic first person message having an audio component and a video component, the at least one first person forwarding the first person message to a second person over an electronic data transmission network 13 accessible through at least the secondary software application operating on the at least one first person's personal computing device, at least one second person with a personal computing device having an audio display and capture device, a video display and capture device and data transmission capability and operating a secondary software application to allow the at least one second person to initiate playback of the first person message on the second person's personal computing device at a time convenient to the second person, capturing a second person reaction recording in real time via the secondary software application, audio display and capture device and video display and capture device of the second person's personal computing device, based on initiating the playback of the first person message, the second person creating a second person response message having an audio component and a visual component in response to the first person message using the secondary software application, the audio display and capture device and video display and capture device of the second person's personal computing device, transmitting the electronic first person message, the second person reaction recording and the second person response message to the primary software application operating on the at least one computer server or computer network, the primary software application operating on the computer server compositing the first person message, the second person reaction recording and the second person response into a composite recording; and delivering the composite recording to at least the first person via an electronic data transmission network 13.

[0100] The preferred embodiment of the present invention is system for compositing asynchronous video messages and responses, which is best explained conceptually with reference to Figure 2. The present invention provides a method and system for the recording of separate asynchronous video messages and compositing them into one video, which simulates a real-time, synchronous conversation. In a most preferred embodiment, the method and system of the present invention will be used to allow a fan to interact with an influencer.

[0101] The method and system of the present invention is based about a fan utilising a personal computing device such as a smartphone 12, or tablet computer 11 to capture an audio/video message and then send that audio/video message to an influencer who also has a personal computing device such as a smartphone 12, or tablet computer 11 for playback, record the influencer' s reaction to the fan's message, allow the influencer to record an audio/video message in response thereto and then composite the messages and reaction recording into a single composite message 14.

[0102] Preferably, access to the system for both the fan and the influencer will be via a respective personal computing device. The secondary software application operating on both the fan and the influencer personal computing devices is preferably the same software application and the functionality and use of the software application determined by the category of the user which is preferably determined at login according to the user's unique login information. The personal computing device can be any type however, it will typically be a smart phone or computer tablet having at least one communication pathway in order to communicate with the computer server 10 operating the primary software application.

[0103] The personal computing device preferred for use in the present invention includes a processor with on-board memory, a display, at least one input apparatus, at least one output apparatus (such as audio output, directly via speakers or similar or indirectly via a port or similar allowing the connection of speakers, visual output or similar), and access to at least one communication pathway to transmit data between system components. Normally, the display will preferably be a touchscreen as many personal computing devices currently available have this feature. The advantages of the touchscreen include allowing a larger display and also allowing the display to function as a part of or as, the input apparatus. The display will also function as a video playback device. These types of devices also usually have at least one camera and at least one microphone which will allow video capture and audio capture.

[0104] Preferably, the software operating on the hardware of the preferred embodiment includes a primary software application operating on the computer server 10 and a secondary software component is provided at the personal computing device level on each of the user personal computing devices in order to interact with the primary software application.

[0105] The primary software application is preferably the "engine" of the system and method, responsible for receiving the various messages and recordings created or captured by a user (fan and/or influencer) and to composite these into a single message for delivery to a user.

[0106] Preferably, the secondary software application is provided to operate on a smartphone 12 or tablet 11 with appropriate connections through the smartphone 12 or tablet 11 to the computer server 10 operating a primary software application in order to gain additional information to that present on the smartphone 12 or tablet 11. The additional information may be obtained from the computer server 10 and/or by push notification from the computer server 10 to the smartphone 12 or tablet 11 and/or upon request from the smartphone 12 or tablet 11.

[0107] The secondary software application will preferably allow communication with the primary software application operating on the computer server 10. Preferably, the primary software program operating on the computer server 10 is more advanced and is responsible for the bulk of the processing with the secondary software application operating on the smartphone 12 or tablet 11 typically smaller and with less processing power, optimised to send and receive instructions and requests and leaves the operations requiring larger processing power to the primary software program operating on the computer server 10.

[0108] Users will normally download the secondary software application to their smartphone 12 or tablet 11. The download of the software application will normally include appropriate instructions to be stored in the memory of the smartphone 12 or tablet 11 in order to create and maintain links and associations with the computer server 10 in order to communicate with one or more databases stored thereon.

[0109] The smartphone 12 or tablet 11 normally provides access to one or more

communications pathways in order to communicate with the computer server 10 in order to access the system. Normally, the computer server 10 will include or have access to one or more databases containing information about the users such that information regarding the identity of any one or more of these parties may be communicated by the smartphone 12 or tablet 11 or the software application to ensure that the smartphone 12 or tablet 11 requesting data from the computer server 10 or to which data is to be sent or from which information is received, is a smartphone 12 or tablet 11 of an authorised user of the system. This functionality is normally accomplished through a login facility in which the user uses a smartphone 12 or tablet 11 to log into the system.

[0110] The login process may use login details that the user has developed for another application or use. For example, the user may use a Facebook or Twitter account login or similar or alternatively login details for an email system such as Gmail or Hotmail in order to access the system of the present invention. Normally, details of the user login will be stored in a corresponding user profile in at least one user database and as a login request is received, the computer server or computer network will typically ensure that the login details supplied match those of a user before allowing access to the system and any databases on the system.

[0111] Normally, a login prompt is produced and displayed as a displayed image or interface on the display of the personal computing device and including at least one action button. This will normally allow input or selection of the desired login information into an input template and which also prompts input of the login information provided by action in the form of a submission to the computer server or computer network. This will normally be a two-part process in which the user will normally select the desired login type if permitted followed by entry of the user particular identification information and password followed by the submission step. Upon submission, the entered details will be sent to the primary software application operating on the computer server 10 for authorisation.

[0112] Therefore, the smartphone 12 or tablet 11 is typically used to create a login request which is then sent via a communications pathway to the computer server 10 whereupon the system of the present invention checks the user database(s) for a match and allows access to the system if the match occurs and denies access to the system if a match does not occur.

[0113] An input apparatus used to input information into the smartphone 12 or tablet 11 therefore will typically be formed or displayed on the display of the personal computing device as required, normally in the form of a virtual keyboard including letters of the alphabet, numbers and/or symbols as well as one or more action icons to allow a user to implement action on the smartphone 12 or tablet 11.

[0114] The method of the present invention is preferably achieved by computer hardware operating software containing instructions in association with one or more communications pathways between the computer hardware operating software compliant with the system, in order to achieve the method.

[0115] The computer server or computer network will normally include a processor with memory operating instructions and a number of databases stored in electronic form. The databases will typically include at least one user database containing a unique user profile for each user of the system and at least one database of messages and/or recordings may be maintained separately from the at least one user database or alternatively, the messages and/or recordings may be stored in the respective user profiles. It is anticipated that the at least one user database can be provided as a single database, with the designation of a user as being either a fan or influencer (or both) dependent upon the use of the system.

[0116] The system of the present invention will normally be implemented through instructions which when followed, generate one or more interfaces on a smartphone 12 or tablet 11, and examples of these interfaces are included as Figures 3 to 13. The instructions will normally be sent from the primary software application on the computer server 10 to a user's smartphone 12 or tablet 11 and which will then be followed in order to generate an interface in real time and update the interface according to the user's interaction with the system.

[0117] Many smartphones 12 or tablets 11 have touchscreens for display allowing the user to directly interact with the touch screen in order to interact with the interface. One or more "buttons" are provided on the interface to allow the user to interact with the smartphone 12 or tablet 11 and through the smartphone 12 or tablet 11, to interact with the system.

[0118] The system of capturing the video messages will normally be implemented through instructions which when followed, generate one or more interfaces on a personal computing device. The instructions will normally be sent from the primary software application on the computer server or computer network to a user's personal computing device and which will then be followed in order to generate an interface in real time and update the interface according to the user's interaction with the system.

[0119] Many of these personal computing devices have touchscreens for display allowing the user to directly interact with the touch screen in order to interact with the interface. However, a normal non-touchscreen display can be used with a movable pointer or selection tool in order to allow a user to interact with the interface. One or more "buttons" are provided on the interface to allow the user to interact with the personal computing device and through the personal computing device, to interact with the system.

[0120] The generated interface will typically be updated substantially in real time according to the rules or instructions which are issued by the primary software application operating on the computer server or computer network and the at least one user database. The generated interface will also typically be updated substantially in real time according to interactions by the user(s) with the system.

[0121] In use, and from the user point of view, the system will normally include a selection interface will preferably allow a fan to drill down into different areas in order to identify particular influencers. Once the fan has identified one or more influencers to follow, the fan can then preferably "follow" the influencer by tapping an action button which will trigger the addition of that influencer to the fan's profile. Normally, after the setup stage, every time the fan logs into the system, the fan can select from a stored list of influencers that they are following in order to undertake further action.

[0122] Selection of a particular influencer will typically trigger generation and display of an influencer profile interface. The influencer profile will typically be constructed in a manner similar to the fan profile and once created, the fan can view the influencer profile by selection of the influencer from a list. The influencer profile will typically include an image of the influencer, together with information relating to the influencer such as demographic information or statistics and the profile will particularly preferably include a newsfeed or update list of current or historical news in relation to the particular influencer.

[0123] The fan will also preferably have an action button provided on either the influencer profile interface or directly from the influencer selection interface that will allow the fan to create a fan message. Typically, the action button will allow the fan to begin recording a fan message. The action button will typically be known as a shutter button in some preferred embodiments. Other setup buttons may be provided on the interface or on a subsequent interface allowing the fan to set up the recording. The fan will typically be allowed to video is recorded or a still images recorded together with audio and also be provided with a flip camera button to activate either the front or rear camera on the personal computing device as required.

[0124] Activation of the shutter button by the fan will typically cause the secondary software application operating on the personal computing device to begin recording audio and preferably video via the hardware provided on the personal computing device in order to capture the fan message. Preferably, the fan message will be limited to a particular time limit such as for example 10 seconds in length, 15 seconds in length 25 seconds in length or 30 seconds in length. Although the fan message will normally be limited, a 30 second length limit is preferred.

[0125] At this stage, the fan may be able to review the fan message that has been captured and can choose to retain the message captured or to trash the fan message and record another.

[0126] Preferably, the secondary software application will generate and display a simple thumbnail selection process allowing a first person to select a thumbnail image to accompany their video message as an identifier to be presented to the second person. A schematic of an interface 70 used for this purpose is shown in Figure 7. Once the recording of the first person video message is completed, the first user is preferably presented with a number of thumbnail still images 71 extracted from the recorded sequence of the video component of the first person video message. The first user can preferably tap any of the thumbnail images to select and present the chosen thumbnail image that appears next to the first person video message when presented to the second person. This allows the first user to select the most interesting or appealing thumbnail image to display to the second person upon receiving the message.

[0127] Another pre-processing step that is preferably implements by the secondary software application operating on the first person personal computing device is cropping of recorded video messages to a square format about a centre as illustrated schematically in Figure 4. This is typically achieved by cropping about the personal computer device specific centre 41 aided by a circular viewfinder 42 to assist with a meaningful crop. This in turn allows a reduction in the video file size and optimising transfer time of the data relating to the pre-processed video component of the first person video message. Preferably device specific centre 41 is set by the operating software of the personal computer device. Most personal computer devices have operating software that allows the device to detect the camera's orientation and/or the centre of the camera recording zone using the hardware of the personal computer device. The system of the present invention may utilise the personal computer device's own operating software to locate the centre of the recorded visual component.

[0128] A feature-detection (or facial recognition) algorithm may be performed over each frame in the recorded visual component to determine the orientation and/or position of the person's face within the frame. This can be done as well as the utilisation of the personal computer device's own operating software to locate the centre of the recorded visual component or instead of. The aim of feature detection when used in this way is preferably two-fold: (1) locate a plane using the eyes as reference, i.e. a straight line can be drawn between pupil centers, and extended out to the edge of the recorded visual component or a viewfinder overlay applied to the recorded visual component, and (2) locate a plane where the shoulders of the user meets the edge of the recorded visual component or a viewfinder overlay applied to the recorded visual component, i.e. connect the points where shoulders meet the edges using a straight line.

[0129] This may allow the secondary software application to center the user's face in the crop zone and crop around that center.

[0130] Once ready, the first user can send/submit the fan message.

[0131] As a first person is normally not going to be playing sound while the device's microphone is open, echo will normally not be a problem but acoustic echo cancellation may be used. A schematic illustrated of the mechanism that creates echo is illustrated in Figure 3. If used, it is generally used before a second technique known as noise suppression is preferably applied. If acoustic echo cancellation is not applied, noise suppression is applied as a first per- processing step. Unfortunately noise suppression is usually destructive to the audio component captured, in other words, it removes some of the intended sound as well as the background noise. To recover the speech component which is important in the context of the present invention, a technique called acoustic gain control is preferably applied. This enhances the audio close to pre- noise suppression levels and makes the audio component more clearly audible. The three step process of acoustic echo cancellation, noise suppression and acoustic gain control applied to an in input audio component to produce pre-processed audio component is illustrated in Figure 5.

[0132] The audio pre-processing preferably occurs on the personal computing device used to capture the video message. This will preferably optimise the audio component of the captured video message for upload to the primary software application operating on the computer network or server.

[0133] Most personal computing devices used within the system of the present invention capture or record the visual component of the video message in full width (720p/HD or

[0134] Once the fan has chosen to send/submit the fan message, the secondary software application operating on the personal computing device will typically forward the fan message to the primary software application operating on the computer server or computer network via the available data transmission pathways. If required, the message can be compressed in size prior to sending and then decompressed by the primary software application.

[0135] Once the primary software application operating on the computer server or computer network receives the fan message, the fan message is then preferably forwarded to the influencer. There may be a vetting stage applied to the fan message to ensure that inappropriate fan messages are not forwarded to the influencer. This vetting stage may be accomplished automatically by a part of the primary software application using image recognition software to identify inappropriate images and/or voice or word recognition used to recognise inappropriate audio. [0136] When a fan has already set up their profile and logs into the system at subsequent times, the fan will normally be presented with a homepage interface. The homepage interface will typically include an operations bar or buttons, normally at an upper or lower portion and the operations bar will normally include a home button, search button, a record button, access to the storage facility having the stored messages and replies and a profile button allowing the fan to edit their profile. A generic homepage will also typically include action buttons allowing the fan to ask a question or create a fan message and/or answer a question or fan message.

[0137] A similar process to that described above will typically be followed when an influencer logs into the system for the first time. As an influencer using the system, the influencer will typically open the secondary software application operating on their personal computing device by tapping the application icon or tile. The first time an influencer uses the application, the influencer will normally be presented with a signup interface generated and displayed on the display of the personal computing device. The signup interface will prompt the creation of an influencer profile including entry of salient information such as the influencer' s name, email address, preferred password and a picture or image to be used as the profile image.

[0138] The influencer profile may include other information such as gender, date of birth, address, preferences and/or interests although this information is optional and may be added at a later time into the influencer profile. The influencer profile is preferably stored in a user profile in association with the primary software application operating on the computer server or computer network. The system administrator may undertake a vetting process when an influencer creates a new profile or updates their profile to ensure that the information added into the profile is not scandalous or contrary to law in any way and/or that the information added complies with the information required by the system.

[0139] Entry of information into the secondary software application is preferably using a virtual keyboard which is produced and displayed on the interface, normally as an overlay and/or uploaded using the personal computing device, particularly, using the image capture software present on the personal computing device and/or the audio capture software. The information will normally be entered into one or more entry fields provided on the interface and there will normally be one or more action buttons on the interface to allow entry of information and/or movement about the interface.

[0140] Once the influencer has set up their user profile, there will normally be an action button allowing them to continue. The influencer can manually activate the "continue" action button and activation of this button will normally trigger generation and display of a further interface.

[0141] Influencers will preferably have the ability to set a topic for discussion and edit that topic as required in order to prompt or maintain the interest of fans. The influencer may be incentivised in order to maintain the interest of fans and be rewarded or incentivised according to the number of fans requesting interaction with the influencer.

[0142] Importantly, the influencer will be able to view fan messages and answer fan messages with an appropriate action button provided on an interface generated and displayed on the influencer personal computing device. The influencer will typically be given any indication of the number of pending fan messages that are awaiting an answer, normally on a home screen interface. Such an interface may also include a recent activity portion which includes or identifies information relating to the influencer' s recent activity. Where more than one entry occurs on the recent activity portion, the recent activity portion may be movable to advance through the recent activity posts. Normally, this is achieved by sliding or swiping the recent activity portion.

[0143] When the influencer selects the action button allowing the influencer to answer a fan message, the new interface is typically generated and displayed on the personal computing device of the influencer and at the same time, the audio and visual capture devices of the personal computing device are preferably activated so that the influencer can see a real-time image of themselves on the display of their personal computing device and also a preview portion showing previews of the unanswered fan messages.

[0144] Normally, the preview portion includes at least a screenshot "still" from the fan message or the fan's profile picture, preferably the thumbnail chosen by the first user or fan as outlined above. The influencer can move through the pending fan messages by direct manipulation on the display of the personal computing device such as by swiping or sliding for example. Selection of a particular fan message to be answered by the influencer will preferably cause the secondary software application operating on the influencer personal computing device to start capture of video and/or audio as the fan message plays in order to capture the reaction recording in real time. The selection may occur in any way using any motion on the display to initiate the selection and the capture of the reaction recording.

[0145] Preferably, the fan message will display in a different portion of the interface to the influencer image so that the influencer can see the fan message being played as well as having the influencer image captured and played back to the influencer in real time. Preferably, it will be possible for the influencer to pause the playback of the fan message and/or stop the playback of the fan message. However, in order to capture the most realistic reaction recording, it is preferred that once the fan message has been initiated, the influencer cannot pause or stop the fan message until the end of the fan message.

[0146] Once the fan message has ended, and the reaction recording has taken place, the influencer can then record a response to the fan message. This is typically done through a similar process as the fan recording a fan message as explained above. Once the influencer has recorded their response, the influencer can typically review the response and either dump or trash the response.

[0147] Preferably, the secondary software application will generate and display a simple thumbnail selection process allowing a second person to select a thumbnail image to accompany their video message as an identifier to be presented to the first person. Once the recording of the second person video message is completed, the second person is preferably presented with a number of thumbnail still images extracted from the recorded sequence of the video component of the second person video message. The second user can preferably tap any of the thumbnail images to select and present the chosen thumbnail image that appears next to the second person video message when presented to the first person. This allows the second user to select the most interesting or appealing thumbnail image to display to the first person upon receiving the message.

[0148] Another pre-processing step that is preferably implements by the secondary software application operating on the second person personal computing device is cropping of recorded video messages to a rectangular and preferably square format. In a particularly preferred embodiment, the rectangular format is provided with a circular viewfinder overlay to highlight the features within the circular viewfinder and obscure the features outside the circular viewfinder. This can be done in any way but preferably a circular viewfinder layer is provided about the identified centre of the visual component and then obscuring everything outside the circular viewfinder for example by blurring or defocussing.

[0149] This is typically achieved by cropping about a personal computer device specific centre aided by a circular viewfinder to assist with a meaningful crop. This in turn allows a reduction in the video file size and optimising transfer time of the data relating to the pre- processed video component of the second person video message. Preferably device specific centre is set by the operating software of the personal computer device. Most personal computer devices have operating software that allows the device to detect the camera's orientation and/or the centre of the camera recording zone using the hardware of the personal computer device.

The system of the present invention will therefore preferably utilise the personal computer device's own operating software to locate the centre of the recorded visual component.

[0150] A feature-detection (or facial recognition) algorithm may be performed over each frame in the recorded visual component to determine the orientation and/or position of the person's face. The aim of feature detection when used in this way is preferably two-fold: (1) locate a plane using the eyes as reference, i.e. a straight line can be drawn between pupil centers, and extended out to the edge of the recorded visual component or a viewfinder overlay applied to the recorded visual component, and (2) locate a plane where the shoulders of the user meets the edge of the recorded visual component or a viewfinder overlay applied to the recorded visual component, i.e. connect the points where shoulders meet the edges using a straight line.

[0151] This may allow the secondary software application to center the user's face in the crop zone and crop around that center.

[0152] Once completed, the second user/influencer can send/submit the response.

[0153] The audio component of the second person video message preferably needs to be cleared of echo caused by the second person's personal computing device speaker is playing the audio component of the first person video message while the second person's personal computing device microphone is open. Acoustic echo cancellation is preferably used to remove or at least diminish an echo. After applying this cancellation technique to the audio component of a captured second person video message, the echo is preferably removed but it is still difficult to hear what the user is saying, particularly if there is a significant level of ambient noise (street, party etc.). Noise suppression is therefore preferably applied. Noise suppression is unfortunately destructive to the audio captured, in other words, it removes some of the intended sound as well as the background noise. To recover the speech component which is important in the context of the present invention, a third technique called acoustic gain control is typically applied. This enhances the audio back to pre-noise suppression levels of voice-like sounds and makes the audio more audible.

[0154] The audio pre-processing preferably occurs on the personal computing device used to capture the video message. This will preferably optimise the audio component of the captured video message for upload to the primary software application operating on the computer network or server.

[0155] Most personal computing devices used within the system of the present invention capture or record the visual component of the video message in full width (720p/HD or

[0156] Once sent or submitted, the influencer response will be conveyed to the primary software application operating on the computer network or computer server.

[0157] After the message & response video are uploaded, the actual composing pipeline starts. In this process, the primary software application deals with the problem of different personal computing devices having different recording profiles (camera resolution, audio performance etc), captured video messages being recorded in a different geographical locations, generating different assets depending on video type and outputting the resulting collateral into a format that can be easily consumed by client devices.

[0158] A key part of the composition is to create a circular concatenated video master file of a composed message is used to create a novel and engaging presentation through animation of the visual representations of first person video message and the second person video message in the concatenated video master file. This is achieved partly through the cropping and centering described above and partially through an animation process.

[0159] There are two preferred embodiments of animation of the first person video message and the second person video message, namely a first preferred embodiment, one form of which is illustrated in Figure 8, where only the reaction of the asking party is captured and included in the composited output, and a second preferred embodiment, one form of which is illustrated in Figure 9, where both reactions of the asking party and responding party are captured, and both are included in the composited output.

[0160] As the video messages from the first person and second person are preferably in circular format, as are the faces that are usually captured, a compact yet comfortable and engaging viewing experience is possible through animation of the first person video message and the second person video message.

[0161] In a preferred embodiment, the visual representations of the video messages will preferably be concatenated to be displayed in different positions relative to one another in the concatenated video master file in order to form an intimate visual connection or relationship between the visual representations of the first person video message and the second person video message with appropriate hierarchical prominence or significance provided for each visual representations dependent upon the position in the playback of the concatenated composed message.

[0162] This will preferably involve positioning the visual representations of the first person video message and the second person video message about the interface and moving the visual representations as required during the playback. The respective positions and movements will normally be applied to each of the visual representations of the first person video message and the second person video message during the composition or concatenation process undertaken by the primary software application.

[0163] Positions such as which of the visual representations of the first person video message and the second person video message is in the foreground, movement of the visual representations of the first person video message and the second person video message into and out of the interface, and overlapping of the visual representations of the first person video message and the second person video message (both in terms of degree of overlap and which overlaps which) can be used.

[0164] According to the first preferred embodiment, one form of which is illustrated in Figure 8, when a first person is asking a question and the second person is listening and a reaction recording is captured, the visual representation of the two videos messages can be centrally located and overlapped to imply an intimate connection or relationship between the visual representations of the first person video message and the second person video message, giving some natural emphasis to the video message of the first person when asking the question, for example providing the first person video appearing in the foreground, and slightly over the top of the second person's video containing the reaction to the question. At the end of the first person's video message, the visual representation of the first person video message can move off the interface (for example to one side) and the visual representation of the second person video message can be moved to the centre of the interface.

[0165] According to the second preferred embodiment, one form of which is illustrated in Figure 9, the model in the first preferred embodiment may be extended to alternate between the visual representation of the first person video message and the visual representation of the second person video message in the form of a reaction and then the visual representation of the second person video message in the form of a video reply to the first person, and then the visual representation of the first person video message reaction of the first person listening to the second person's video reply message. Preferably, in this embodiment, the visual representations of the videos messages will preferably be swapped in prominence as the respective video messages are being delivered.

[0166] The animation will preferably be applied by the primary software application once the primary software application has received the normalised audio components and normalised video components of both the first person the message and the second person video message. One or more template interfaces may be provided into which the primary software application locates the normalised video components.

[0167] The primary software application will then composite or concatenate the video message, namely the fan message, the reaction recording and the influencer response into a single concatenated video master file. It is important to realise at this juncture that the reaction recording and the influencer response may be substantially continuously recorded and therefore, although there may be two portions to the message, such a combined or continuously recorded message may be a single message and may be combined with the fan message to form a composite message. The composite message is then normally sent back to at least the fan. The composite message may be also loaded onto the influencer profile and will typically be stored against both the influencer profile and the fan profile.

[0168] The fan and/or the influencer can also share the concatenated video master file with third parties. Preferably, a still shot of the composite message (order still shot of a portion of the message) may be provided as a part of the sharing of the composite message. Text can preferably be added by either the fan and/or the influencer to the sharing of the composite message.

[0169] The composite message will typically play the first person message and the second person reaction response at substantially the same time but preferably slightly delay the second person reaction response message in order to simulate a short reaction time between the first person message in the influencer reaction response. This will typically elevate the realism provided by the system of the present invention. Once the first person message and second person reaction response have finished, the second person response will typically play.

[0170] In the present specification and claims (if any), the word 'comprising' and its derivatives including 'comprises' and 'comprise' include each of the stated integers but does not exclude the inclusion of one or more further integers.

[0171] Reference throughout this specification to One embodiment' or 'an embodiment' means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearance of the phrases 'in one embodiment' or 'in an embodiment' in various places throughout this specification are not necessarily all referring to the same embodiment.

Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more combinations.

[0172] In compliance with the statute, the invention has been described in language more or less specific to structural or methodical features. It is to be understood that the invention is not limited to specific features shown or described since the means herein described comprises preferred forms of putting the invention into effect. The invention is, therefore, claimed in any of its forms or modifications within the proper scope of the appended claims (if any) appropriately interpreted by those skilled in the art.

Claims

1. A method for intimate message and response video composing, the method including the steps of

a) a first user capturing a first person video message having an audio component and a visual component using a first person personal computing device,

2. A system for intimate message and response video composing, the system including

a) At least one computer server or computer network operating a primary software

application

c) a second person personal computing device having an audio capture device, a video capture device and data transmission capability and a operating a second software application used to capture a second person video message having an audio component and a visual component, and to apply at least echo cancellation, noise suppression and gain control to the audio component of the second person video message on the second person personal computing device to create a pre-processed second person video message and uploading the pre-processed first person video message to the at least one computer server or computer network

d) the at least one computer server or computer network normalising the visual component of both the pre-processed first person video message and the pre-processed second person video message to substantially conform in size, aspect ratio and format, normalising the audio component of both the pre-processed first person video message and the pre- processed second person video message to substantially conform in audio quality and format, normalising an audio level between the respective audio components of the normalised first person video message and the normalised second person video message; and

3. A system for intimate message and response video composing, the system including a first person personal computing device having an audio capture device, a video capture device and data transmission capability and a operating a second software application used to capture a first person video message having an audio component and a visual component, and to apply at least noise suppression and gain control to the audio component of the first person video message on the first person personal computing device to create a pre- processed first person video message and uploading the pre-processed first person video message to at least one computer server or computer network.

4. A system for intimate message and response video composing, the system including a

second person personal computing device having an audio capture device, a video capture device and data transmission capability and a operating a second software application used to capture a second person video message having an audio component and a visual component, and to apply at least echo cancellation, noise suppression and gain control to the audio component of the second person video message on the second person personal computing device to create a pre-processed second person video message and uploading the pre-processed first person video message to at least one computer server or computer network.

5. A system for intimate message and response video composing, the system including at least one computer server or computer network receiving at least one first person video message having an audio component and a visual component, receiving at least one second person video message having an audio component and a visual component normalising the visual component of both the first person video message and the second person video message to substantially conform in size, aspect ratio and format, normalising the audio component of both the first person video message and the second person video message to substantially conform in audio quality and format, normalising an audio level between the respective audio components of the normalised first person video message and the normalised second person video message; and output of a concatenated video master file of a composed message including the normalised audio components and normalised video components of both the first person the message and the second person video message to emulate a synchronous video conversation from asynchronously captured video messages.