SE2050318A1

SE2050318A1 - A system

Info

Publication number: SE2050318A1
Application number: SE2050318A
Authority: SE
Inventors: Ali Ghadirzadeh; Danica Kragic Jensfelt; Mårten Björkman
Original assignee: Croseir Ab
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2021-09-24
Also published as: WO2021191126A1

Abstract

A system (300) comprising: an output device (302) configured to provide an audio and / or visual stimulation (114) to a user (304); and one or more biometric sensors (306) that are configured to provide biometric-signalling (308), which is representative of body measurements of the user (304) while they are exposed to the audio and / or visual stimulation (114). The system (300) further comprises a processor (310) configured to: process the biometric-signalling (308) in order to determine an interest-level-score (222); and provide a control-signal (312) to the output device (302) based on the interest-levelscore (222), wherein the control-signal (312) is for adjusting the audio and / or visual stimulation (114) that is provided by the output device (302).

Description

A SYSTEM The present disclosure relates to a system for providing audio and /or visual stimulation to a user.

According to a first aspect of the present disclosure there is provided a system comprising:an output device configured to provide an audio and / or visual stimulation to ausenone or more biometric sensors that are configured to provide biometric-signalling,which is representative of body measurements of the user while they are exposed to theaudio and /or visual stimulation; anda processor configured to:process the biometric-signalling in order to determine an interest-level-score;provide a control-signal to the output device based on the interest-level-score, wherein the control-signal is for adjusting the audio and /or visual stimulation that is provided by the output device.

Such a system can advantageously adjust audio-visual content in response to aThe system can enable new problems to be solved, such as to search for visual contents in a determined interest-level-score to iteratively optimize the audio-visual content. user's brain, which was not possible before.

The processor may be configured to iteratively process the biometric-signalling andprovide an updated control-signal until: a target interest-level-score is reached; a pre-defined number of iterations are performed; or a rate of change of the interest-level-score between multiple iterations is less than a target value.

The processor may be configured to provide an output-signal that is representative of:the audio and /or visual stimulation that is provided by the output device as part ofthe last iteration; orthe audio and / or visual stimulation that is associated with the highest interest-level-score.

The processor may be configured to: iteratively process the biometric-signalling to determine a plurality of interest-level-scores, wherein each interest-level-score is associated with an instance of the audio and/or visual stimulation; iteratively provide a plurality of control-signals to the output device based onassociated ones of the plurality of interest-level-scores; determine one of the interest-level-scores as a selected-interest-level-score byapplying a function to the plurality of interest-level-scores; and provide an output-signal that is representative of the instance of the audio and /or visual stimulation that is associated with the selected-interest-level-score.

The output-signal may comprise: an identifier ofthe instance of the audio and /or visual stimulation that is associatedwith the selected-interest-level-score; and / or the instance of the audio and / or visual stimulation that is associated with the selected-interest-level-score.

The output device may be a multimedia device, such as a multimedia interface.

The one or more biometric sensors comprise one or more electroencephalography (EEG)sensors. The biometric-signalling may comprise EEG-signalling, which is representativeof electrical activity in a user's brain while they are exposed to the audio and /or visualstimulation. The one or more EEG sensors may be attachable to the user's scalp or forehead.

The one or more biometric sensors may comprise one or more pupil size sensors. Thebiometric-signalling may comprise pupil-size-signalling, which is representative of the size /dilation of the user's pupil while they are exposed to the audio and /or visual stimulation The processor may be configured to iteratively process the biometric-signalling and provide a new control-signal periodically, for example every 0.5 seconds.

The system may be a facial reconstruction system. The output device may be configuredto provide a visual stimulation to the user that represents a person's face. The control-signal may be for adjusting one or more features of the person's face that is provided by the output device.

The system may be a machine support system. The output device may be configured toprovide a visual stimulation to the user that comprises a visual representation of how asubsequent operation can be performed by at least one machine in the machine supportsystem. The control-signal may be for adjusting one or more features of the visual representation of how the subsequent operation can be performed.

The processor may be configured to:determine a selected-visual-stimulation as:the visual stimulation that is provided by the output device as part of the lastiteration; orthe visual stimulation that is associated with the highest interest-level-score; andprovide a machine-control-signal for automatically controlling the at least one machine based on the selected-visual-simulation.

The output device may be provided as part of a smart phone.

The processor may comprise: a biometric processor that is configured to process thebiometric-signalling in order to determine the interest-level-score by applying a deep artificial neural network.

The processor may comprise: a generative model that is configured to generate the controlsignal (which is used for adjusting the audio and / or visual stimulation) by processing alatent variable input which uniquely or stochastically characterizes the generated outputstimuli. Examples of these generative modes are generative adversarial networks andVariational Autoencoders. ln this way, the processor further comprises: a latent-variablegenerative adversarial network model (or other generative model) that is configured to generate the control signal based on the latent variable.

The processor may further comprise: an optimizer that is configured to generate the latentvariable values based on the interest-level-score as part of providing the control-signal.The optimizer may be configured to: receive descriptive data; and process the descriptivedata when determining the latent variable. ln this way, the latent variable can be prohibited from taking one or more values that are inconsistent with the descriptive data.

There is also disclosed a computer implemented method comprising: processing biometric-signalling in order to determine an interest-level-score,wherein the biometric-signalling is representative of body measurements of a user whilethey are exposed to audio and /or visual stimulation; and providing a control-signal to an output device based on the interest-level-score,wherein the control-signal is for adjusting the audio and / or visual stimulation that is provided by the output device.

There may be provided a computer program, which when run on a computer, causes thecomputer to configure any apparatus, including a system, processor, controller, robot, ordevice disclosed herein or perform any method disclosed herein. The computer programmay be a software implementation, and the computer may be considered as anyappropriate hardware, including a digital signal processor, a microcontroller, and animplementation in read only memory (ROM), erasable programmable read only memory(EPROM) or electronically erasable programmable read only memory (EEPROM), as non- limiting examples. The software may be an assembly program.

The computer program may be provided on a computer readable medium, which may bea physical computer readable medium such as a disc or a memory device, or may beembodied as a transient signal. Such a transient signal may be a network download,including an internet download. There may be provided one or more non-transitorycomputer-readable storage media storing computer-executable instructions that, whenexecuted by a computing system, causes the computing system to perform any method disclosed herein.

One or more embodiments will now be described by way of example only with reference to the accompanying drawings in which: Figure 1 shows an example embodiment of a system that provides audio and /orvisual stimulation to a user; Figure 2 shows an example embodiment of a processor, which can be used in thesystem of Figure 1; Figure 3 illustrates a further example embodiment of a system; Figures 4A, 4B and 4C illustrate three views of an example embodiment of aheadset that can be used part of a system described herein; Figure 5 shows an example embodiment of a system that can search for faces; Figure 6 shows an example embodiment of a system that can function as a human- machine interface and control a robot; and Figure 7 illustrates schematically an example embodiment of a computer implemented method.

Figure 1 shows an example embodiment of a system 100. The system 100 includes anoutput device 102 that can provide an audio and /or visual stimulation 114 to a user 104.ln some examples, the output device 102 can be a portable electronic device such as asmart phone, tablet computer or laptop computer. The output device 102 may providestimulation 114 to the user 104 using a single medium of expression (such as audio or visual), or stimulation using a plurality media expressions.

The system 100 also includes one or more biometric sensors 106. Each biometric sensor106 can provide biometric-signalling 108, which is representative of body measurementsof the user 104 while they are exposed to the audio and / or visual stimulation 114. Variousexamples of biometric-signalling 108 are described below. ln an example where the outputdevice 102 is a display screen that provides an image of a person's face as visualstimulation 114, the biometric-signalling 108 can be representative of the user's response to the face that is being displayed to them.

The system further includes a processor 110. The processor 110 can process thebiometric-signalling 108 in order to determine an interest-level-score (not shown). Anexample of how such an interest-level-score can be determined is provided below. Theinterest-level-score is representative of the user's 104 interest in the audio and /or visualstimulation 114 to which they are being exposed. The processor 110 can then provide acontrol-signal 112 to the output device 102 based on the interest-level-score. The control-signal 112 is for adjusting the audio and /or visual stimulation that is provided by the outputdevice 102. ln this way, the system 100 can automatically adjust the audio and /or visualstimulation 114 that is provided to the user 104 based on the user's interest in thepreviously presented stimulation. This can result in the audio and / or visual stimulation being iteratively adjusted based on the user's 104 interest.

As will be discussed in detail below, the system 100 can iteratively search in the user's104 brain for audio-visual associations that maximize a given objective function of theinterest-level-score. This can be achieved by the system 100 providing stimulus 114 to the user 104, and then refining that stimulus 114 to match an unknown target.

Figure 2 shows an example embodiment of a processor 210, which can be used in thesystem of Figure 1. The processor 21 0 includes an interest calculator 216, a control-signal calculator 218 and a loop controller 220.

The processor 210 receives biometric-signalling 208, such as from the one or morebiometric-sensors that are shown in Figure 1. The interest calculator 216 iterativelyprocesses the biometric-signalling 208 to determine a plurality of interest-level-scores 222.As discussed with reference to Figure 1, each interest-level-score 222 is associated withan instance of the audio and /or visual stimulation to which they were being exposed whenthe biometric-signalling 208 was recorded. The interest calculator 216 iteratively providesthe plurality of interest-level-scores 222 to the control-signal calculator 218. The control-signal calculator 218 can then iteratively provide a plurality of control-signals 212 to theoutput device (not shown) based on associated ones of the plurality of interest-level-scores222. ln this way, for each iteration, the processor 210 calculates an interest-level-score222 for a current instance of the audio and /or visual stimulation, and determines a control- signal 212 for adjusting the audio and /or visual stimulation for the next iteration. ln this example, the processor 210 also includes a loop controller 220. The loop controller220 receives the plurality of interest-level-scores 222, one for each iteration. ln someexamples, the loop controller 220 can store the plurality of interest-level-scores 222 forsubsequent processing. ln other examples the loop controller 220 can process thereceived interest-level-scores 222 "on the fly" as they are received. The loop controllerlt will be appreciated that the functionality of the loop controller 220 may be provided by a single 220 can perform one or more of the functionalities that are described below. processor or may be distributed across a plurality of processors.

The loop controller 220 can automatically control the interest calculator 216 and /or thecontrol-signal calculator 218 such that the processor 210 iteratively processes thebiometric-signalling 208 and provides an updated control-signal 212 until: a target interest-level-score is reached; a pre-defined number of iterations are performed; or a rate of change of the interest-level-score between multiple iterations is less than a target value.

For example, the loop controller 220 can compare a current interest-level-score 222 witha target interest-level-score. lf the interest-level-score 222 is less than the target interest- level-score, then the loop controller 220 may control the control-signal calculator 218 and the interest Calculator 216 such that they generate a new control-signal 212 and calculatea new interest-level-score 222. That is, the processor 210 performs another iteration if theinterest-level-score is less than a target interest-level-score. lf the interest-level-score isgreater than or equal to the target interest-level-score, then the loop controller 220 maydetermine that no further iterations are required. Therefore, the loop controller 220 cancontrol the control-signal calculator 218 and /or the interest calculator 216 such that no further iterations are performed once the target interest-level-score is reached.

Optionally, the loop controller 220 can maintain a count of the number of iterations thathave been performed in a current operation. The loop controller 220 can compare thecount with a pre-defined number of iterations. lf the count is less than the pre-definednumber of iterations, then the loop controller 220 may control the control-signal calculator218 and the interest calculator 216 such that they generate a new control-signal 212 andcalculate a new interest-level-score 222. lf the count equals the pre-defined number ofiterations, then the loop controller 220 can control the control-signal calculator 218 and / or the interest calculator 216 such that no further iterations are performed. ln some examples, the loop controller 220 may calculate a rate of change of the interest-level-score 222 between multiple iterations. This may simply be the difference between acurrent interest-level-score 222 and an immediately preceding interest-level-score 222, ormay involve a more sophisticated function based on a plurality of preceding interest-level-scores 222. The loop controller 220 can compare the determined rate of change of theinterest-level-score with a target value, and take similar action to that discussed abovebased on the result of the comparison. ln this way the loop controller 220 can control thecontrol-signal calculator 218 and / or the interest calculator 216 such that they stop performing further iterations when the rate of change drops below a target level. ln the example of Figure 2, the loop controller 220 determines one of the interest-level-scores 222 for a plurality of iterations as a selected-interest-level-score by applying afunction to the plurality of interest-level-scores 222. Applying such a function may involveselecting the highest interest-level-score as the selected-interest-level-score.Alternatively, applying such a function may involve selecting: the lowest interest-level-score as the selected-interest-level-score; or the interest-level-score that is closest to atarget-interest-score, as the selected-interest-level-score. lt will be appreciated that thenature of the function will depend on the particular application with which the processor 210 is being used.

The loop controller 220 can optionally provide an output-signal 224 that is representativeof the instance of the audio and /or visual stimulation that is associated with the selected-interest-level-score. The output-signal 224 may comprise an identifier (for example afilename or a description of the stimulation) of the instance of the audio and / or visualAdditionally or alternatively, the output-signal 224 may include the instance of the audio and /or visual stimulation that is associated with the selected-interest-level-score. stimulation. ln this way the output-signal 224 can include the results of the search for apreviously unknown target, which in some examples is the audio and /or visual stimulation that is associated with the highest interest-level-score. ln some examples, the processor 210 can provide an output-signal 224 that isrepresentative of the audio and /or visual stimulation that is provided by the output deviceas part of the last iteration. ln examples where the iterations cease when an acceptableinterest-level-score 222 is reached, such an output-signal 224 can correspond to a desired stimulation.

Figure 3 illustrates a further example embodiment of a system 300. The system 300 canperform a closed-loop iterative brain search for audio-visual associations using EEG devices.

The system 300 includes a wearable product that has Electroencephalography (EEG)sensors 306. The EEG sensors are examples of biometric sensors. As shown in thefigure, and as well known in the art, the EEG sensors 306 can be placed on a user's 304head to measure his /her brain activity and provide EEG-signalling 308 (which can alsobe referred to as EEG-data). The EEG-signalling 308 is an example of biometric-signalling(biometric-data), and is representative of electrical activity in the user's 304 brain whilethey are exposed to the audio and /or visual stimulation. The system 300 also includes amultimedia (visual and audio) interface 302. The multimedia interface 302 is an exampleof an output device. The multimedia interface 302 can be a smart phone, a virtual or augmented reality device, for example, to provide audio-visual stimuli to the user 304.

The system 300 includes a processor, which in this example is the processor 310. Theprocessor 310 is an Artificial Intelligence (Al) based system which searches for audio-visual associations in the user's 304 brain by maximizing the positive response (interestlevel) to iteratively produced synthetic audio-visual stimuli, responses that are decodedfrom the EEG-signalling 308. referred to, or determined from, event-related potentials (ERPs) or P300 waves. ln neuroscience, such positive response signals can be Advantages of using EEG sensors 306 to measure positive responses can include (1) fastfeedback from the EEG-signalling 308, for instance in less than a second while beneficiallynot distracting the user 304 as it does not require attention, and (2) the non-invasive nature of the process. ln Figure 3, three software modules of the embedded processor 310 are shown: (1) an Al-based biometric processor 316 which processes the EEG-signalling 308 to infer the user's304 response to the stimulation provided by the multimedia interface 302; (2) an Al-basedgenerative model 328, which generates a control-signal 312 to cause the multimediainterface 302 to provide synthetic audio-visual content to the user 304; and (3) an Al-basedoptimizer 326, which optimizes for the human response by feeding the generator model 328 with an appropriate input.

The EEG sensors 306 (hardware) measure brain signals to generate the EEG-signalling308. The EEG-signalling 308 is processed by the biometric processor 316 (software) tocalculate a measured interest level 322 (which is an example of an interest-level-score).Such an interest level 322 can also be referred to as a positive biometric response. Thebiometric processor316 can continuously calculate a measured interest level 322 bydecoding the EEG-signalling 308, or it can periodically calculate the measured interestlevel 322.

The biometric processor is a deep artificial neural network (deep network) which consistsof a number of trainable parameters. ln some examples, the biometric processorimplements a regression model. The trained model receives the raw biometric signals,e.g., raw EEG data, over a short period of time (such as a few hundred milliseconds) asan input. The trained model can then output an interest score value, which reflects howclose the current stimuli (that the person is exposed to) is to the target stimuli that the system is searching for.

The model of such a biometric processor can be trained using a training dataset thatincludes pairs of true input-output data. The training dataset can be constructed by askinga number of human participants to remember a given stimuli. For the example of a facialreconstruction task, the stimuli would be a human face. Therefore, in this example, eachparticipant will remember a given face (target face). The participant will then be exposedto a set of synthetic human faces, some of which are similar to the target face. Thesimilarity is quantified based on the distance between a latent variable value corresponding to the given face and a latent variable value corresponding to the target face. The latent space here is the latent space of the generative model 328 that generates the syntheticfacial images. Therefore, each training pair in the training dataset is constructed as thefollowing: (i) the input data is the sequence of biometric measures, such as a few hundredsof milliseconds of EEG-signalling 308, while the participant is exposed to the stimuli; and(ii) the output data is a similarity measure between the generated stimuli and the targetstimuli, e.g., quantified by a distance metric, e.g., Euclidean distance, between the latentvariable values corresponding to the generated and the target stimuli. Alternatively, theoutput data for the training can be provided by an operator based on their subjectiveopinion of the similarity between the stimuli (such as the similarity between: (i) the targetface; and (ii) the synthetic human face that was displayed to the participant when the inputdata was recorded). Once a sufficiently large training dataset is constructed, the trainableparameters of the network are updated such that for every training pair, setting the traininginput as the input of the network, the produced output ofthe network is as close as possibleto the output of the corresponding training data. This paradigm is known as supervised learning.

The optimizer model 326 (software) receives the measured interest level 322, anditeratively optimizes the stimuli to increase this response (interest level 322). The optimizermodel 326 then feeds the generative model 328 (software) (in this example with a latentvariable 327) such that it generates an appropriate control-signal 312 for the multimediainterface 302. ln this way, the generative model 328 can generate and adjust audio-visualsynthetic data for providing to the user 304 as stimulation. The multimedia interface 302(hardware) then displays the generated content to the user 304. ln this example, all the software processes are performed by the processor 310 (hardware).

As will be discussed below, the optimizer 326 can optionally receive descriptive data, inthis example textual data 334. The optimizer can also process such textual data 334 whendetermining the latent variable value, for instance to prohibit the latent variable from takingone or more values that are inconsistent with the textual data 334. That is, the textual data334 can be used to apply restrictions to the position in a latent space that can be represented by the latent variable 327.

The optimizer 326 can optimize the generated stimuli to get closer to the target stimulibased on a gradient-based approach (such as a stochastic gradient descent), or based ona gradient-free approach (such as the Nelder-Mead optimization algorithm), or using Reinforcement Learning (RL), as non-limiting examples. lt will be appreciated that any algorithm can be used that optimizes the measured interest level 322 for any specific application. ln some examples the generative model 328 can be a latent-variable generativeadversarial networks (GAN) model. The system can generate synthetic images by trainingthe latent-variable generative model. The latent variable of the generative model is aparameter to generate different faces. The generative model itself can be implementedbased on generative adversarial networks (GAN), or Variational Autoencoders (VAE), orflow-based generative models, or generally any approach that can train a generative modelto generate high-dimensional data (such as images, as represented by the control signal312), based on a low-dimensional latent variable (such as the latent variable 327 that isoutput by the optimizer 326). An example of a generative adversarial network (GAN) isthe lnfoGAN method (Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, l. andAbbeel, P., 2016. lnfogan: lnterpretable representation learning by information maximizinggenerative adversarial nets. ln Advances in neural information processing systems (pp.2172-2180).) Every time a new face is shown to the user, the user's response is measured by readingfrom the user's brain while he/ she is looking at the image. The response is treated as ameasure that quantifies how similar the generated (synthetic) face is to the target face,i.e., the face that the user is remembering. ln line with the above description, in order tomeasure the response the biometric processor 316 (which may be implemented as a deepneural network) has been trained such that it calculates a single output value (the interestlevel / positive biometric response 322) corresponding to the user's response for the measured EEG-signalling 308.

The closed-loop and iterative nature of the process that is shown in Figure 3 can increasethe positive response of the user's 304 brain (as represented by the measured interestlevel 322) in several steps. ln this way the closed-loop system can search in the user's304 brain for some audio-visual content. The loop can be repeated, for exampleperiodically such as every 0.5 second by performing the following steps: (1) the user 304observes the new audio-visual content that is provided by the multimedia device 302; (2)the user's brain signals are recorded as EEG-signalling 308; (3) the user's 304 brainresponse is measured and quantified from the EEG-signalling 308 to provide a measuredinterest level 322; (4) an optimization algorithm is applied to the interest level 322 togenerate a latent variable 327 that is intended to optimize the interest level 322; and (5) a control-signal 312 forthe new content is generated by the generative model 328 based on the latent variable 327, and provided to the multimedia interface 302 such that it displays the new content.

One example application for the systems described herein is to search for images of facesin a user's brain as he / she tries to remember a person. The user can be asked to look atrandom pictures of a person. Then, a system described herein can generate visualstimulation that reconstructs the face image merely by measuring the user's EEG braindata while the user views different synthetic faces generated by the system. ln this way,a measured interest level can be optimized in order to result in a synthetically generated face that should be a good match for the face that the user is remembering.

As discussed above, one or more of the examples disclosed herein can advantageouslygenerate synthetic audio-visual content using state-of-the-art Al methods, such asgenerative adversarial networks (GAN). A latent variable generative model can beconsidered as a function that maps a latent random variable into an output audio/image.When combined with an optimization algorithm, e.g., reinforcement learning, they can beused to iteratively optimize the generated content. Such examples can enable newproblems to be solved, such as to search for visual contents in one's brain, which was not possible before.

Figures 4A 4B, and 4C illustrate three views of example embodiments of a headset thatcan be used part of a system described herein. The headset can include two EEG sensorsthat include contact electrodes that are attached to the user's scalp or face (such as theparticipant's forehead). ln this way, a non-invasive system for providing EEG-signallingcan be used. ln this example the electrodes are in contact with the user such that they can monitor the activity in the brain.

Figure 5 shows an example embodiment of a system that can search for faces, in order tomatch a synthetically generated image of a face with a user's 504 recollection of the face.Features of Figure 5 that are also shown in Figure 3 have been given corresponding reference numbers in the 500 series, and will not necessarily be described in detail here.

The system includes a display device 502 (which is an example of an output device) thatln the same way as discussed below, the display device provides the image based on a control sequentially provides synthetically generated images of a face to the user 504. signal 512 received from a generative model 528. lt has been found that images of a face can be represented by a latent variable 527, whichis a relatively low number of dimensions and therefore can be processed sufficientlyquickly to enable the processes to run in real-time. ln this way, at least some of thesystems described herein can advantageously process very high-dimensional biometricsignalling (such as EEG-signalling), determine a one-dimensional interest level 522,calculate a relatively low-dimensional latent variable (for instance 20-dimensional), andconvert it into a high dimensional audio or visual stimulation (such as the image of a face).ln this way, beneficially, the optimization and generation of audio /video stimulation canbe performed on low-dimensional input signalling such that the system can operate efficiently in terms of required processing resources and processing time. ln this example, a text description 534 of a face is provided to the optimization algorithm526. The text description 534 can be used as part of a start-up routine so that the initiallydisplayed image on the display device 502 represents a good starting point for thesubsequent iterations. For instance, a text description 534 of a "40-year-old man" can beprovided; in which case a stock image of a 40-year-old man's face can be provided as aninitial image on the display device 502. Additionally or alternatively, the optimizationalgorithm 526 can use the text description 534 such that the determined latent variable527 cannot be given a value that is inconsistent with the text description 534. For instance,if the text description indicates that the target face is a man's face, then the optimizationalgorithm may only generate latent variables 527 that are used by the generative model528 to generate an image of a man's face, and not a woman's face. This may beimplemented by the optimization algorithm 526 restricting the latent space in which theoptimizer can operate to exclude all latent variables that are characterised as women's faces. ln this way, the system of Figure 5 can be considered as a facial reconstruction system (ora facial composite system), which includes a display device 502 that provides a visualstimulation that represents a person's face to the user 504. The system can then generatea control-signal for adjusting one or more features of the person's face that is provided by the display device 502.

Figure 6 shows an example embodiment of a system that can function as a human-machine interface in orderto control a robot 638. The control is based on matching imagesthat are displayed to a user 604 with the user's thoughts about what they would like the robot 638 to do. Again, features of Figure 6 that are also shown in Figure 3 have been given corresponding reference numbers in the 600 series, and will not necessarily be described in detail here. ln a similar way to that described above, the system includes an interest detector 616 thatprovides an interest-level-score 622 to an optimizer 626. The optimizer 626 provides asignal (such as a latent variable) to a generative model 628, which causes a display device602 to display an image (or a sequence of images, such as a video) to the user 604. Theimage or video is of an operation that can be performed by the robot 638. Such anoperation may be to pick up a screwdriver, as shown in Figure 6. lt will be appreciatedthat any potential operation, or sequence of operations, that can be performed by the robotcan be displayed to the user 604. For instance, the system may be able to determine afinite list of operations that can be performed by the robot 638 based on its currentoperational state. The generative model 628 can then sequentially display images of the potential operations to the user. ln this example, the generative model 628 provides an output-signal 624 to a robotcontroller 636. ln the same way as described above with reference to Figure 2, the output-signal 624 may correspond to the image or video that attracted the highest interest of theuser 604, according to the recorded EEG-signalling 608. The output-signal 624 canrepresent the operation or operations that were displayed to the user 604 on the display device 602 when the highest interest-level-score was detected.

The robot controller 636 can process the output-signal 624 and provide a robot-control-signal 640 to the robot 638. ln one example, the robot controller 636 may use a databaseor look-up table to determine an appropriate robot-control-signal 640 based on thereceived output-signal 624. For instance the output-signal 624 may represent an identifierof a robotic operation, and the database / look-up table may provide the required robot- control-signal 640 that will cause the robot 638 to perform the intended robotic operation. ln this way, the system of Figure 6 can be considered as a machine support system, whichincludes a display device 602 that provides a visual stimulation of how a subsequentoperation can be performed by at least one machine (such as the robot 638) in the machinesupport system. The system also generates a control-signal 612 for adjusting one or morefeatures of the visual representation of how the subsequent operation can be performed.Optionally, the system can determine a selected-visual-stimulation (as represented by theoutput-signal 624 in Figure 6) as: (i) the visual stimulation that is provided by the display device 602 as part of the last iteration; or the visual stimulation that is associated with the highest interest-level-score. The system can then provide a machine-control-signal (asrepresented by the robot-control-signal 640) for automatically controlling the at least one machine / robot 638 based on the selected-visual-simulation_ ln this way, if a machine (robot 638) cannot determine how to do its task, the system canformulate a question by forming a visual image of different ways in which the task can beperformed. The visual image is displayed to the human user 604. Based on the interest-level feedback from the human, the system can update the visual image that is displayedto the user 604 until a satisfactory image is found. The resulting image should indicate thecorrect way of performing the task that the machine (robot 638). Advantageously, thesystem can then select the visual image from the plurality of images that were displayedto the user that attracted the highest interest, and automatically control the machine (robot 638) based on the selected image. ln this example, a text description 634, which can be for example voice commands, canbe provided to the optimization algorithm 626. Such a text description 634 can be used in the same way as described above with reference to Figure 5.

By way of non-limiting example: assume a recycling robot 638 is not sure whether anobject should go to a plastic bin or to an iron bin. A system associated with the robot 638formulates these options as a plurality of visual images. The images in this case can be:(i) the object being thrown to the plastic bin; and (ii) the object being thrown to the iron bin.The human user 604 views several of those images very quickly and the system candetermine the action with the highest interest score. Then the system can generate anappropriate robot-control-signal 640 that is sent to the robot 638 so that it can take the appropriate action. ln some examples, the display device 602 of Figure 6 that displays potential operationsthat can be performed by the robot 638 can be an augmented reality (AR) display device.

For instance, AR glasses can be used to display the potential operations to the user 604. ln some examples, the robot 638 can be a prosthetic limb that is attached to the user 604.ln this way, the user's response to information that is presented to them by an output device can be used to automatically control the prosthetic limb. lt will be appreciated from the above description that the systems described herein can be used in one or more of the following applications: (a) Searching for most satisfying visual designs (industrial design, fashion, web, etc.)considering a group of participants (a focus group). (b) Searching for images of human faces in a person's brain for criminologyapplications such as identifying a suspect. (c) Searching for audio-visual synthetic patterns to regulate brain state as a non-invasive therapy tool. (d) Human-robot collaboration to control behavior of a robot by maximizing theresponse of the human user continually during the collaboration. (e) Gaming applications to control the game using EEG brain signals by maximizing the interest level.

The examples that are described above mainly relate to use of EEG-signalling as thebiometric-signalling. However, it will be appreciated that any biometric-signalling that canbe used to determine an interest-level-score that is representative of a person's interestcan be used. For instance, the one or more biometric sensors may include a pupil sizesensor that provides pupil-size-signalling. The pupil size sensor may include a camerathat obtains images of a user's eye. The pupil-size-signalling can be representative of thepupil size / dilation of the user's eye while they are being exposed to the audio and /orvisual stimulation. One example of a known way of determining such pupil parameters isdescribed in Chapter 4 (Pupil dilation) in the book "Eye Tracking in User ExperienceDesign" (Paperback ISBN: 9780124081383). system described herein can use both EEG-signalling and pupil-size-signalling to ln some examples, the processor of a determine an interest-level-score.

Figure 7 illustrates schematically an example embodiment of a computer implemented method.

At step 750, the method involves processing biometric-signalling in order to determine an interest-level-score. As discussed in detail above, the biometric-signalling isrepresentative of body measurements of a user while they are exposed to audio and /or visual stimulation.

At step 752, the method involves providing a control-signal to an output device based onthe interest-level-score. Again, as discussed in detail above, the control-signal is foradjusting the audio and /or visual stimulation that is provided by the output device. Thisautomatic adaption of the audio and / or visual stimulation, based on the interest-level- score, can advantageously enable sophisticated searching operations to be performed that can be used to dynamically determine an output-signal that is not necessarily one of a predetermined, finite, group of candidate audio and /or visual stimulations.

Claims

1. A system (100) comprising:an output device (102) configured to provide an audio and / or visual stimulation(1 14) to a user (104);one or more biometric sensors (106) that are configured to provide biometric-signalling (108), which is representative of body measurements of the user (104) whilethey are exposed to the audio and /or visual stimulation (114); anda processor (110) configured to:process the biometric-signalling (108) in order to determine an interest-level-score (222); andprovide a control-signal (112) to the output device (102) based on theinterest-level-score (222), wherein the control-signal (112) is for adjusting the audio and / or visual stimulation (114) that is provided by the output device (102).

2. The system of c|aim 1, wherein the processor (210) is configured to iterativelyprocess the biometric-signalling (208) and provide an updated control-signal (212) until:a target interest-level-score is reached;a pre-defined number of iterations are performed; ora rate of change of the interest-level-score between multiple iterations is less than a target value.

3. The system of c|aim 2, wherein the processor (210) is configured to provide anoutput-signal (224) that is representative of: the audio and /or visual stimulation that is provided by the output device as part ofthe last iteration; or the audio and / or visual stimulation that is associated with the highest interest- level-score.

4. The system of c|aim 1, wherein the processor (210) is configured to: iteratively process the biometric-signalling (208) to determine a plurality of interest-level-scores (222), wherein each interest-level-score (222) is associated With an instanceof the audio and /or visual stimulation; iteratively provide a plurality of control-signals (212) to the output device based onassociated ones of the plurality of interest-level-scores (222); determine one of the interest-level-scores (222) as a selected-interest-level-score by applying a function to the plurality of interest-level-scores (222); and provide an output-signal (224) that is representative of the instance of the audio and / or visual stimulation that is associated with the selected-interest-level-score.

5. The system of claim 3 or claim 4, wherein the output-signal (224) comprises: an identifier ofthe instance of the audio and /or visual stimulation that is associatedwith the selected-interest-level-score; and / or the instance of the audio and / or visual stimulation that is associated with the selected-interest-level-score.

6. The system of any preceding claim, wherein: the one or more biometric sensors comprise one or more electroencephalography(EEG) sensors (306), and the biometric-signalling comprises EEG-signalling (308), which is representative ofelectrical activity in a user's brain while they are exposed to the audio and / or visual stimulation.

7. The system of any preceding claim, wherein: the one or more biometric sensors (106) comprise one or more pupil size sensors,andwhich is the biometric-signalling (108) representative of the size/ dilation of the user's pupil while they are exposed to the audio comprises pupil-size-signalling, and /or visual stimulation

8. The system of any preceding claim, wherein the processor (210) is configured toiteratively process the biometric-signalling (208) and provide a new control-signal periodically, for example every 0.5 seconds.

9. The system of any preceding claim, wherein: the system is a facial reconstruction system; the output device (502) is configured to provide a visual stimulation to the user(504) that represents a person's face; and the control-signal is for adjusting one or more features of the person's face that is provided by the output device (502).

10. The system of any one of claims 1 to 8, wherein: the system is a machine support system; the output device (602) is configured to provide a visual stimulation to the user(604) that comprises a visual representation of how a subsequent operation can beperformed by at least one machine (638) in the machine support system; and the control-signal is for adjusting one or more features of the visual representation of how the subsequent operation can be performed.

11. The system of any claim 10, wherein the processor is configured to:determine a selected-visual-stimulation as:the visual stimulation that is provided by the output device as part of the lastiteration; orthe visual stimulation that is associated with the highest interest-level-score; andprovide a machine-control-signal (640) for automatically controlling the at least one machine (638) based on the selected-visual-simulation_

12. The system of any preceding claim, wherein the processor comprises:a biometric processor (316) that is configured to process the biometric-signalling(108) in order to determine the interest-level-score (222) by applying a deep artificial neural network.

13. The system of any preceding claim, wherein the processor comprises:an optimizer (326) that is configured to generate a latent variable value (327) based on the interest-level-score (222) as part of providing the control-signal (312).

14. The system of claim 13, wherein the optimizer (326) is configured to: receive descriptive data (334); and process the descriptive data (334) when determining the latent variable value(327), such that the latent variable is prohibited from taking one or more values that are inconsistent with the descriptive data (334).

15. The system of claim 13 or claim 14, wherein the processor further comprises:a latent-variable generative adversarial network model (328) that is configured to generate the control signal based on the latent variable (327).

16. A computer implemented method comprising:processing (750) biometric-signalling (108) in order to determine an interest-level- score (222), wherein the biometric-signalling (108) is representative of body measurements of a user (104) while they are exposed to audio and /or visual stimulation(114); and providing (752) a control-signal (112) to an output device (102) based on theinterest-level-score (222), wherein the control-signal (112) is for adjusting the audio and/ or visual stimulation that is provided by the output device (102).