GB2584827A

GB2584827A - Multilayer set of neural networks

Info

Publication number: GB2584827A
Application number: GB1907476.4A
Authority: GB
Inventors: Charles Herrema Simon
Original assignee: Ai First Ltd
Current assignee: Ai First Ltd
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2020-12-23
Also published as: GB201907476D0

Abstract

Continuous speech is analysed to detect whether a voice call is fraudulent, malicious or nuisance. Words 202 are identified in speech from the caller and the recipient, and the words are processed with an analysis engine 200 to identify their context. A score is generated that indicates a risk of the call being nuisance, fraudulent or malicious. If the score exceeds a predetermined threshold then an action is taken during the voice call, such as ending the call, recording the call, or warning the user. In an embodiment a neural network is used where a first layer 204 detects concepts 206, a second layer 210 detects intentions, a third layer 212 detects conversational intents and a fourth layer 214 generates a score. An assistant layer 208 may provide past concepts and past intentions to the second and third layers to provide better context.

Description

MULTILAYER SET OF NEURAL NETWORKS

The present disclosure relates to monitoring voice calls, in particular, but not exclusively monitoring telephone calls and analysing the call for potential fraudulent, malicious and/or nuisance activity.

Speech recognition systems are known in the art. Such systems enable automatic conversion from spoken words into text which may be used by or displayed on an electronic device.

Speech recognition systems may be used to initiate pre-determined actions. Such systems may initiate an action upon recognition of a key word or words. A speech recognition system may be used to detect specific answers, such as "yes" or "no", to set questions as part of an automated call answering system. Such systems are often operable to execute an action for each incoming call.

Fraudulent schemes conducted through telephone calls have increased in number over time. Fraudulent callers often rely on scripts comprising a number of methods of convincing a potential victim into providing them with details such as bank account details or the like. These scripts are constantly changing over time in order to avoid detection by authorities and to find ever more convincing methods of deceiving potential victims.

One prior art system is shown in US2016/0150092.

Description

According to a first aspect of the invention there is provided a system arranged to analyse continuous speech within a voice call to determine if the voice call is fraudulent, malicious and / or nuisance. The system will typically be arranged to use a processing circuitry to perform one or more of the following: i) digitise audio signals containing the continuous speech, from the caller and the recipient of the voice call to generate audio data and to identify words contained within the audio data; ii) processing the identified words with an analysis engine which is arranged to process the words to identify the context in which the words and are used; iii)generating a score according to the analysis performed by the analysis engine according to a set of metrics; The calculated score, based upon the analysis performed by the analysis engine, may be indicative of the risk that the telephone call is fraudulent, malicious, nuisance.

The method may be arranged to execute an action, during the voice call, upon the calculated score reaching a pre-determined threshold and wherein the method may be arranged to allow update of the metrics as the voice call is analysed.

Such a system is believed advantageous, in part, because it is able to process both sides of the voice call which is believed to allow for more accurate analysis of the content of the call.

Conveniently, the system is arranged such that the calculated score depends on historic content from the voice call in addition to the identified words.

The system may be arranged such that the calculated score is includes any of the following factors in its calculation: the occurrence of predetermined words, phrases and/or sentences; the order of words, phrases and/or sentences; the intonation / change of intonation in the voice call; a volume change(s) within the voice call; frequency of change of intonation; repetition of change of intonation; frequency of change of volume; repetition of change of volume; frequency of key words, frequency of words within a given topic; repetition of words; recognition of a voice (eg biometric) profile; length of the call; repeated calls from the same voice profile; time of day of the call; the day of the week; etc. The executed action may be any one or more of: Terminating the voice call; Forwarding the voice call to one or more telephone numbers stored in a database; Recording at least a portion of the voice call; and Indicating to a user that the voice call may be fraudulent.

The digitised audio signals may include any of the following: Letters, Words, Numbers, Key tones.

Conveniently the system is arranged to compare the digitised audio signals against data stored in at least one database.

The system may be arranged such that the database is automatically updated on the analysis of previous telephone calls.

The system may be arranged such that the or each database is operable to be manually updated.

According to a second aspect of the invention there is provided a machine readable medium containing instructions which when executed caused a mobile telephone to operate with the system of the first aspect.

According to a third aspect of the invention there is provided a machine readable data carrier containing instructions which when read by a computer cause that computer to provide the system of the first aspect.

According to a fourth aspect of the invention there is provided a method arranged to analyse continuous speech within a voice call to determine if the voice call is fraudulent, malicious and / or nuisance, the method comprising at least one of the following steps: using processing circuitry arranged to: I) digitise audio signals containing the continuous speech, from the caller and the recipient of the voice call to generate audio data and to identify words contained within the audio data; 2) processing the identified words with an analysis engine which is arranged to process the words to identify the context in which the words and may be used to; to generate a score according to the analysis performed by the analysis engine according to a set of metrics; wherein the calculated score, may be based upon the analysis performed by the analysis engine, is indicative of the risk that the telephone call is fraudulent, malicious, nuisance; and the method may be arranged to execute an action, during the voice call, upon the calculated score reaching a pre-determined threshold.

Conveniently, the method arranged to allow update of the metrics as the voice call is analysed.

The machine readable medium referred to in any of the above aspects of the invention may be any of the following: a CDROM; a DVD ROM / RAM (including -R/-RW or +R/+RW); a hard drive; a memory (including a USB drive; an SD card; a compact flash card or the like); a transmitted signal (including an Internet download, ftp file transfer of the like); a wire; etc. Features described in relation to any of the above aspects of the invention may be applied, mutatis mutandis, to any of the other aspects of the invention.

The skilled person will appreciate that many portions of the technology described herein may be provided by software, hardware and/or firmware and or any combination thereof.

Example embodiments will no be described w th reference to the accompanying drawings, in which: Figure 1 schematically shows components of an example Figure 2 shows a diagrammatic representation of a system arranged to determine the probability that a telephone call is fraudulent, malicious or the like; Figure 3 schematically shows further details of an analysis engine of Figure 2; and Figure 4 shows a flow chart of the system shown in Figure I. The embodiments described herein relate to a system arranged to determine the risk that a telephone call is fraudulent, malicious, nuisance or the like.

It is believed that the call monitoring system will be used personally, but it may also be used by businesses, linguists and fraud prevention authorities.

in the embodiment being described and shown in Figure I, a system 100 comprises a first server 102 and a second server 104 to which a remote processing device 106a, b c, d, e can make a connection across a network 108. In the following text, it is convenient to refer to the processing devices as 106 with no following letter if the context supports this reference, or with a following letter if a specific reference is being made to one of the illustrated devices.

An analysis engine 105 is provided in at least one of the server(s) 102, 104.

The skilled person will appreciate that the system 100 might contain a number of servers 102, 104 other than two shown in the embodiment being described. There may for instance be a single server 104 or more than two servers 102, 104. The servers 102, 104 need not be located close to one another and may be separated by significant distances.

In the embodiment being described, the network 108 is a Wide Area Network (WAN) and indeed is the Internet. The skilled person will appreciate that there may be other WAN's, etc. that would be suitable for implementing the technologies described herein.

The processing device 106c is a smartphonc such as an AppleTm iPhoneTm" a SamsungTm GalaxvTM, or any other such phone including those running AndroidTm, josTm. WindowsTM mobile, BlackberryTm, etc. To aid understanding of the embodiment being described, sonic of the internal components are schematically shown.

Processing circuitry 110 is connected to an internal memory 112 which contains instructions to cause the processing circuitry to function and also contains data including an internal database 114. The internal database 114 may be used should a portion of the system be implemented on the processing circuitry 110 and in such embodiments the internal database 114 may act as the database 107 described elsewhere.

The processing circuitry 110 is also connected to a speaker 116, or other such sound producer, and a microphone 118, or other sound detector. The skilled person will appreciate that the smartphone 106c is useable as a telephone to provide a voice call with the speaker 116 and the microphone 118 being used to generate sound and input sound respectively.

The processing devices 106a, 106b are also provided and are shown with access to the network 108. The devices 106a, 106b may also be smartphones but alternatively they may be tablets (such as an iPadlm, or other Androidlm tablet, etc), a laptop computer, a desktop computer, a television (including a smart television), or any other such suitable device capable of making a voice call in addition to providing an embodiment.

The skilled person will also appreciate that voice calls can be made over networks other than the TCP/IP protocol used by the Internet 108. The skilled person will appreciate the further networks, but by way of further example Figure 1 shows a PSTN (Packet Switched Telephone Network) 120 to which a telephone 122 is connected. Further, a mobile telephone network 124 shows a further example connecting two further processing apparatus 106d, e to the smartphone 106c. The mobile telephone network may be any suitable mobile telephone network such as GSM (Global System for Mobile communication), GPRS (General Packet Radio Service), UMTS (Universal Mobile Telecommunications System), LTE (Long Term Evolution) or the like.

As the skilled person will appreciate from the following embodiments of this technology allow a voice call to be monitored, regardless of the underlying transport mechanism for that voice call. Figure 1 attempts to exemplify a selection of underlying transport technologies but the figure is not intended to be exhaustive. Indeed, embodiments may be arranged to work with an analogue transport mechanism rather than the packetized transport mechanisms shown in Figure 1.

In use, the system of Figure 1 is arranged to analyse continuous speech within a voice call, commonly referred to as a telephone call. However, as will be evident from Figure 1 the voice call may be provided by a variety of carrier technologies and may for example be a Skype'TM, WhatsApp", voice call or the like.

When a voice call is connected, the processing circuitry 110 is arranged to make a data connection, across the network 108 (or otherwise) to the server(s) 102, 104. Additionally, the processing circuitry 110 is arranged to monitor the speaker 116 and the microphone 118 of the processing unit 106 being used and to digitise the audio signals sent to the speaker 116 or generated at the microphone 118, and so generate audio data representing the audio call (step 400). The voice call proceeds without an impact on the user and audio data is generated which contains a digitised version of the continuous speech.

However, whilst the voice call is proceeding the processing circuitry 110 transmits the digitised speech (ie the audio data) of the voice call to the server(s) 102, 104 for ongoing processing (step 402). In other embodiments the processing device 106 may be arranged to perform some or all of the processing that is now described. Indeed, it is conceivable that some embodiments may transmit the voice call to the server(s) 102, 104 before the voice call is digitised.

It is convenient for the functionality of the processing device 106, such as the smartphone 106c, to be provided by software running on that processing device. In the case of a smartphone 106c this may be loaded on to the smartphone 106c as an App. The skilled person will also appreciate that the functionality could also be provided in the firmware, or hardware of the processing devices 106, or a combination of software (eg an App), firmware and/or hardware.

The server(s) 102, 104 arc arranged to receive the digitised voice call. That is the server(s) 102,104 are arranged to receive the digitised audio signals generated by the processing circuitry 110 from the voice call.

The analysis engine 105 within the server(s) 102, 104 is arranged to process the digitised audio signals received from the processing device 106.

In the embodiment being described, the digitised audio is first processed by the analysis engine 105 to identify individual words within the audio data 300 (step 404). That is, the embodiment being described may be thought of as generating a transcription of the digitised audio 300 which transcription is subsequently processed as described below in part to identify the context in which the words are used. As such, the embodiment being described comprises a transcription engine 302 which is arranged to generate the words that are contained within the audio data 300.

In the described embodiment, the analysis engine 105 comprises a neural network 200, further details of which are shown in Figure 2. The analysis engine 105 also comprises or has access to a database which contains data to be used in the analysis performed by the analysis engine, described hereinafter. Additionally, in the embodiment being described, the analysis engine 105 comprises additional components, shown in Figure 3 and described in more detail below, which in combination with the neural network 200, generate a score as to whether the voice call in progress is likely to be fraudulent, malicious and / or nuisance.

First Layer.

The words 202 identified within the audio data are input to a first layer 204 of the neural network 200. The first layer 204 is arranged to generate, as an output 206, a vector which specifies concepts contained in the input words 202. The concepts are created with the name of the most frequent word used to generate the concept. Here, a concept may be thought of a label as to what the words relate. For example, should the words be discussing a bank account (ie relate to a bank account) then the concept may be labelled 'bank-account'.

In the training text is provided to the neural network 200 with concepts that are expected to occur within voice calls analysed by the system 100. Accordingly, the skilled person will appreciate that the selection of the training data will the domain of voice calls that the system 100 can process.

Assistant Layer.

The neural network 200 also comprises an assistant layer 208, which may be thought of as a supervisory layer.

The assistant layer 208 is in charge of time management and is arranged to assign weight to concepts identified by the first layer 206. In the embodiment being described, the weights are assigned to the concepts dependent on the time that has elapsed since the concept occurred in the voice call. For example, if a concept is raised early in a voice call and then not used again the importance of that concept is likely to of less importance. However, if a concept is repeatedly used within the voice call then it is likely to be of greater importance and is given a higher weight. Thus, it will be seen that the assistant layer 208 allows the score that is being generated to depend on the historic content from the voice call.

In some embodiments, including that being described, the assistant layer 208 is arranged to assign greater weights to predetermined words. Here, the predetermined words may be words that have been found likely used in a fraudulent, malicious and / or nuisance voice call. The presence of such predetermined words may be thought of as increasing the likelihood that such a call is fraudulent, malicious and / or nuisance. Thus, the use of an increased weighting in such embodiments attempts to increase the accuracy of the system.

It can be seen, from Figure 2, that the assistant layer 208 may viewed as a layer running alongside the other layers of neurons. An advantage of an embodiment using the assistant layer 208 is that it frees the neural network from tracking the weight related with time and use of a concept which helps to reduce the number of intermediate neurons.

In some embodiments, including the one being described, the assistant layer 208 is arranged to input concepts into a second layer 210, which is described hereinafter, which concepts have not been generated by the first layer 204, but which have appeared in previous sentences. Embodiments employing such an arrangement have been found to help the NN 200 include interpretations that depend on previous words but helps to modify the meaning in future sentences.

The concepts that are input to the second layer 210 are typically of reduced weight when compared to the concepts output from the first layer 204. This reduced weight becomes appropriate because concepts generated by the assistant layer 208 relate to parts of the voice call that are earlier in the call (ie further in the past) and therefore, they carry less importance than the concepts output from the first layer 204 which are more immediate. This assistant layer, in the embodiment being described, is arranged to manage the weight of the concept taking into account how frequently is used in final intents determined by later layers as described below.

Second Layer.

The concepts generated by the first layer 206 and the assistant layer 208 are input to the second layer 210.

The output of the second layer 210 arc what may be thought of as the intents; ie the second layer 208 is trained to detect intentions derived from the voice call as it is being processed (step 410).

The second layer 208 is trained using full statements that are input to the first layer 202. Training of the first layer 204 is dc-activated whilst the second layer 208 is being trained.

Embodiments that arrange the second layer208 to determine intent in this manner are advantageous as it aids the understanding of complex sentences. The second layer 208 is arranged such that it is able to detect multiple intents within a portion of the voice call that is being analysed. Here the portion of a voice call may be a sentence or within sentences that are close to one another, such as neighbouring sentences.

If multiple intents are detected as being possible then the second layer 208 does not choose which intent should be processed and outputs all the detected intents to a third layer 212 of the neural network 200. It is believed that embodiments that arrange the third layer in this manner, including the embodiment being described, are advantageous over the prior art which can sometimes force selection of a 'best intent' therefore reducing the effectiveness of such systems.

Third Layer.

The assistant layer 208 is further arranged to input past intents that are not occurring in the current portion of the voice call that is being processed.

The embodiment being described is arranged such that the current portion of the voice call is arranged to be the present sentence that is being processed and which is built up as described below. However, other embodiments may be arranged differently, may be such that the current portion is a predetermined number of words, or a predetermined time window, or the like.

Thus, in the embodiment being described, the output of the transcription engine 302 provides information on the current portion, which is based around the sentence being spoken. The system may be arranged, to highlight the current text as the current portion and to concatenate that the current portion unless it is determined that a new sentence has begun.

For example, we could have as input: Current text: I Current text: I want you (concatenated onto 'V); Current text: I want you to provide (concatenated onto I want yote); Current text: I want you to provide account information (concatenated onto 'I want you to provide).

Once it is deemed that a sentence has finished the system, in the embodiment being described, is arranged to start building up a new sentence. For example: Current text: This is important Current text: This is important to move (concatenated on to 'This isimportant'); Current text: This is important to move forward (concatenated on to 'This is important to move forward).

Thus, the system, in the embodiment being described, is arranged to analyse the current text and yet use information derived from previous occurrences of the 'current text'. In the described embodiment, the system is thus arranged to analyse the current sentence using information, via the assistant layer 208, derived from previous sentences.

As with the past concepts that are input, by the assistant layer 208, into the second layer 210, these past intents arise from earlier in the voice call. Accordingly, the neural network is arranged to reduce the weight applied to the past intents when compared to the current intents generated by the second layer 210.

The third layer2I2 is arranged and trained to manage intents inside a conversation and the layer is trained using complete conversations.

In an example conversation between a stammer' and a user, the conversation may be along the following lines: Stammer: I need you to give me more information to protect your account.

User: Like what? Stammer: With you user and password is more than enough to let me help you.

In the embodiment being described, the third layer is arranged to use the last sentence as the current sentence for analysis, but will use previous concepts like "more information" and "your account" which have been picked up in earlier parts of the conversation.

Fourth Layer.

It is conceivable that the third layer 212 of the NN identifies intents that have dependencies between them, are in conflict or the like. In such examples, the output of the third layer 212 may be thought of as being in conflict and making little sense. As such, the fourth layer 214 is used to try and resolve ambiguities in the output of the third layer 212.

Embodiments utilising a fourth layer, such as the one being described, are advantageous because they allow the NN 200 to analyse voice calls where the context of a conversation changes quickly, etc. The training of the fourth layer is performed using complex conversation where the intents contained in the conversation are mixed, change quickly and the like.

The embodiment being described, the analysis engine 105 is arranged such that the metrics upon which the score is calculated can be updated during a call. Such an arrangement helps the system react to new techniques used to try and trick users and allows the system to be more responsive that other embodiments which did not allow such rapid change. The skilled person will appreciate that a portion of the analysis engine is provided by a Neural Network 200 which ordinarily have a training phase and a use phase; as such a Neural Network cannot ordinarily be updated in real time (here, in real time means within the context of an ongoing call).

The skilled person will appreciate that a voice call, such as a telephone call, will ordinarily last from a few seconds to a few minutes. As such, real time, in the context of the embodiments described herein is on the order of roughly any of the following: a few seconds, 30 seconds, a minute. 5 minutes, 10minutes.

In order to allow the update of the metrics during the call, the embodiment being described is arranged to allow the Neural Network 200 that is being accessed to be changed during execution of the method (ie during run-time). That is, the embodiment being described comprises a plurality of neural networks, at least one of which can be trained as the system is executed and at least one of which can be used to analyse the voice call.

To allow to switch between neural networks 200 multiple threads are run at any one instance and the system is arranged such different neural networks may be accessed by each of the threads.

Further, data generated by the analysis of the voice call is captured within the database 107 for each voice call that is analysed and such an arrangement allows the neural network 200 to be changed and the next portion of the text to be analysed. I5

As is illustrated with the aid of Figure 3, there is provided a score generation unit 304 which is arranged to generate a final score of the analysis performed on the ongoing voice call.

The analysis engine 105, in addition to the neural network 200, also comprises a biometric unit 306 which also makes an input into the score generation unit 304. The biometric unit 306 processes the digitised audio signal 300 rather than the words that are input into the neural network 200. Being a digitised version of the audio signal, as opposed to the words, the audio signal 300 still comprises tone, etc. which can be used to further generate a score of the likelihood of the voice call being fraudulent, malicious and/or nuisance.

If the final score, generated by the score generation unit 304, exceeds a predetermined threshold then the system is arranged to perform a predetermined action. The skilled person will appreciate that the final score may be thought of as a confidence level that the call is fraudulent, malicious and / or nuisance. In one embodiment, the system is arranged such that as analysis of the call proceeds the score can be incremented by processing of the analysis engine. Other embodiments may be arranged to decrement the score, or perform other manipulations.

The embodiment being described is arranged such that the fourth layer 214 of the neural network 200 outputs a score to the score generation unit 304. The skilled person will appreciate that many factors could influence whether it can be determined that the call should be held to be fraudulent, malicious and/or nuisance.

In the embodiment being described, the neural network 200 is arranged such that: * A higher score is given is predetermined words are detected within the voice call. The skilled person will appreciate that certain words more likely to be used in a fraudulent, malicious and / or nuisance call than others.

* The second layer 210 of the neural network 200 is arranged, as described above, to detect concepts. In the embodiment being described, the neural network 200 is arranged to increase the score as a function of the number of I5 words located in each concept. Here, it can be helpful to think of the concept as a topic and there may be many words that fall into that topic. For example, the topic / concept might be 'bank account' and words that fall into that concept might be any of the following examples: number, account, bank, sort code, branch, etc. It is likely that the as the number of words increases then the importance of that topic also increases and thus, if that topic is relevant to the call being fraudulent, malicious and / or nuisance then the score should be increased.

* In the embodiment being described the neural network is also arranged such that the age of the topic within the voice call also has an influence on the score. In particular, the system is arranged such that as the age of the topic increases within the voice call then the importance of that topic is consequently reduced and that topic does not have such a large influence on the score.

Embodiments may be arranged to adjust the score being generated by the analysis performed by the analysis engine together with the rest of the system according to any of the following factors. Each of these factors may be thought of as being a metric against which a score can be generated for the voice call being processed and as such, a set of metrics is provided.

1) The occurrence of predetermined words, phrases and sentences.

Including, and in particular, formal introductory language, formal language constructions that convey the impression of authority from the incoming side of the call which can often be used in fraudulent, malicious or nuisance calls.

This may also include citing organisations which are commonly used as narrative 'frames' within telephone frauds, such as the banks, the police or the tax authorities.

2) The order of words, phrases and sentences within any conversation.

3) Intonation and change of intonation during the conversation/s within the voice call, which may include a diminished level of contractions, indicating a higher incidence of formal language.

4) The volume and change in volume of spoken words, phrases and sentences. Telephone fraud depends on creating false realities, usually within limited time frames, and defining elements of stress. These include both implied and overt threats.

5) The frequency of the change in intonation during the conversation. 25 6) The repetition of the change in intonation during the conversation.

7) The frequency in the change in volume during the conversation.

8) The repetition in the change in volume during the conversation.

9) The frequency of key words, phrases and sentences. Keys words and phrases may include commands and demands, as the fraudster uses stress to push the victim towards the intended outcome, of divulging personal or financial information, or transferring funds into a criminal-controlled account.

10) The repetition of key words, phrases and sentences. Repetition is a key trope, as it helps drive the victim towards the intended outcome.

11) The recognition of a particular voice during a conversation. The biometric unit 306 is arranged such it can recognise, via a biometric profile of known fraudsters, the voice of that fraudster. As such, if that biometric profile is recognised a high score may be assigned to the call being fraudulent, malicious and / or nuisance. Indeed, some embodiments may be arranged such that recognition of such a biometric profile is enough in itself to take the score to the predetermined threshold for an action to be taken.

Similarly, the biometric unit 306 may also be arranged to recognise a friendly' biometric profile which may be used to reduce the likelihood of the score reaching the pre-determined threshold for action to be taken.

12) The length of a call is a metric, as telephone fraud often takes several hours to complete.

13) A repeat of a call from the recognised voiceprint and/or CLI (Caller Line Identifier; eg typically the telephone number), where any key metrics have been registered towards a threat analysis.

14) The time of day for the call. Telephone fraud narratives are often inconsistent with the hours kept by legitimate organisations.

15) The day of the week for the call, as above.

The fourth layer 214 of the neural network 200 is arranged to output a final score of the analysis performed by the neural network 200 to the score generation unit 304. The score from the neural network 200 is then combined with the score from the biometric unit 306.

Should the score output from the score generation unit 306 exceed a threshold then the call is deemed to be fraudulent, malicious and/or nuisance the action is executed.

in the embodiment being described, this action is termination of the voice call. However, in other embodiments, the predetermined action may be any one or more of the following: i. terminating the voice call; forwarding the voice call to one or more telephone numbers stored in a database; recording at least a portion of the voice call; iv. storing the call line identification number in at least one database; and v. indicating to a user that the voice call may be fraudulent.

There now follows an example of how the system 100 is used to analyse a voice call between a user of a smartphone 106c (who may be termed a target) and a person (ie a scammer) trying to defraud that user. The example illustrates how the example neural network 200 described above is used to analyse the speech/text within the voice call and assigns a score both to specific words, to sentence constructions and to intentions.

These assigned scores are accumulated to determine an overall risk that the voice call is fraudulent, malicious and/or nuisance. Such calls (ie those that are fraudulent, malicious and/or nuisance) may be thought of as being problematic voice calls.

HMRC' (Her Majesty's Revenue & Customs service) scams are now commonplace, where the target is called by a scammer pretending to work for HMRC.

The scammer may initiate the voice call with the following sentence: Good afternoon. I am calling from Her Majesty's Revenue service to discuss fraudulent activity we have discovered on your account.' While the call is in progress, the system 100 is arranged to score the voice call on a variety of different metrics as described in relation to the described embodiments above. in the particular example being described the voice call is scored on the following metrics: 1. Time of day -Is the time of day appropriate to the caller's purported identity? For example, banks and government organisations will not generally call out of office hours. Thus, if a call is made out of hours a score indicating higher suspicion (which may be a higher or a lower score depending upon whether the low or high threshold is needed to reach the predetermined threshold).

2. Length of conversation -Scam calls are frequently transacted over longer periods of time. Thus, longer voice calls are, in this example, scored with more relevance.

3. Biometrics -Over time the voice print of an incoming call can be matched against a database of known scammers' voices. Thus, if a voice print is matched to an existing profile, then the voice call, in this example, may be scored with more relevance.

4. Formal language -To convey authority scammers may use more formal language, including the use of fewer contractions, i.e. 'We have' rather than 'we've' and '1 am' rather than Also, a formal introductions such as 'Good morning/afternoon/evening.'. As such, in the example being described the system is arranged to provide a score of more relevance if such formal language is used within the voice call.

5. 'Hot' words & phrases -These may include organisations used in the creation of false realities, such as banks, or government bodies.

Therefore, in the example being described, the system is arranged to provide a score of more relevance if such words and/or phrases are used within the voice call.

Thus, noting the above points 1 to 5, the example being described is arranged to score the sentence as follows: Good aft noon' -Formal language. Score of 5 added. Total score 5.

I am calling from' -The formal language continues. Score of 5 added. Total score 10.

Her Majesty's Revenue service' -A 'hot' signifier. Score of 10 added. Total score 20.

to discuss' -Formal language continues. Score of 10 added. Total score 30.

fraudulent activity' -More formal language. Score of 20 added. Total D score 50.

we have discovered on your account' -This reveals the intention within the sentence. On its own these words might be lower-scoring but in the context of the words preceding it, they will score highly because it reveals a clear intention to create a false reality and, in this example, it scores 50. Taking the total score to 100.

At this point, it will be seen that a score of 100 has been accumulated and the call is terminated; where, in this example, 100 is the threshold score for termination.

if biometrics and time of day were also active factors, the scores may accumulate more rapidly.

in the above, the example has been given where the score starts low and is added to until the pre-determined threshold is reached. The skilled person will appreciate that other systems are possible. For example, the score could start high and be decremented until the pre-determined threshold is reached. There may be other scoring system that are suitable to generate the pre-determined threshold.

In the embodiment being described, processing is described as being performed at a specified location, such as on the processing units 106, or the server(s) 102, 104. However, the skilled person will appreciate that in other embodiments processing may be performed at locations other than described here.

Claims

A system arranged to analyse continuous speech within a voice call to determine if the voice call is fraudulent, malicious and / or nuisance, the method comprising: using processing circuitry arranged to: digitise audio signals containing the continuous speech, from the caller and the recipient of the voice call to generate audio data and to identify words contained within the audio data; processing the identified words with an analysis engine which is arranged to process the words to identify the context in which the words and are used; to generate a score according to the analysis performed by the analysis I5 engine according to a set of metrics; wherein the calculated score, based upon the analysis performed by the analysis engine, is indicative of the risk that the telephone call is fraudulent, malicious, nuisance; and execute an action, during the voice call, upon the calculated score reaching a pre-determined threshold.
The system of claim 'wherein the method is arranged to allow update of the metrics as the voice call is analysed.
The system of claim 1 or 2 which is arranged such that the calculated score depends on historic content from the voice call in addition to the identified words.
5. The system of any preceding claim in which the calculated score is additional calculated according to any of the following: 1) The occurrence of predetermined words, phrases and sentences; 2) The order of words, phrases and sentences within any conversation; 3) Intonation and change of intonation during the voice call; 4) The volume and change in volume of spoken words, phrases and sentences; 5) The frequency of the change in intonation during the conversation; 6) The repetition of the change in intonation during the conversation; 20 7) The frequency in the change in volume during the conversation; 8) The repetition in the change in volume during the conversation; 9) The frequency of key words, phrases and sentences; 10) The repetition of key words, phrases and sentences; 30 11) The recognition of a particular voice during a conversation; 12) The length of a call; 13) A repeat of a call from the recognised voiceprint and/or CLI (ie Caller Line Identifier); 14) The time of day for the call; and 15) The day of the week for the call.
6. The system of any previous claim wherein the executed action comprises any one or more of: i. Terminating the voice call; i. Forwarding the voice call to one or more telephone numbers stored in a database; iii. Recording at least a portion of the voice call; and iv indicating to a user that the voice call may be fraudulent.
7. The system of any previous claim wherein the digitised audio signals include any of the following: i. Letters i. Words iii. Numbers iv. Key tones The system of claim 7 wherein the system is arranged to compare the digitised audio signals against data stored in at least one database.The system of any previous claim wherein the database is automatically updated on the analysis of previous telephone calls.The system of any previous claim wherein one or more databases are operable to be manually updated.11. A machine readable medium containing instructions which when executed caused a mobile telephone to operate with the system of any of claims 1 to 10.12. A machine readable data carrier containing instructions which when read by a computer cause that computer to provide the system of any of claims 1 to 10.13. A method arranged to analyse continuous speech within a voice call to determine if the voice call is fraudulent, malicious and / or nuisance, the method comprising: using processing circuitry arranged to: digitise audio signals containing the continuous speech, from the caller and the recipient of the voice call to generate audio data and to identify words contained within the audio data: processing the identified words with an analysis engine which is arranged to process the words to identify the context in which the words and are used; to generate a score according to the analysis performed by the analysis engine according to a set of metrics; wherein the calculated score, based upon the analysis performed by the analysis engine, is indicative of the risk that the telephone call is fraudulent, malicious, nuisance; and execute an action, during the voice call, upon the calculated score reaching a pre-determined threshold.14. The method of claim 13 wherein the method is arranged to allow update of the metrics as the voice call is analysed.15. A system arranged to analyse continuous speech within a voice call to determine if the voice call is fraudulent, malicious and / or nuisance, the method comprising: using processing circuitry arranged to: receive audio data containing digitised audio signals containing the continuous speech, from the caller and the recipient of the voice call processes the received audio data to identify words contained within the audio data; processing the identified words with an analysis engine which is arranged to process the words to identify the context in which the words and arc used; to generate a score according to the analysis performed by the analysis engine according to a set of metrics; wherein the calculated score, based upon the analysis performed by the analysis engine, is indicative of the risk that the telephone call is fraudulent, malicious, nuisance; and execute an action, during the voice call, upon the calculated score reaching a pre-determined threshold.16. A machine readable medium containing instructions which when read by a computer cause a processing circuitry of that computer to: digitise audio signals containing continuous speech, from a caller and a recipient of a voice call to generate audio data and to identify words contained within the audio data; processing the identified words with an analysis engine which is arranged to process the words to identify the context in which the words and are used; to generate a score according to the analysis performed by the analysis engine according to a set of metrics; wherein the calculated score, based upon the analysis performed by the analysis engine, is indicative of the risk that the telephone call is fraudulent, malicious, nuisance; and execute an action, during the voice call, upon the calculated score reaching a pre-determined threshold.