WO2023069047A1

WO2023069047A1 - A face recognition system to identify the person on the screen

Info

Publication number: WO2023069047A1
Application number: PCT/TR2022/051066
Authority: WO
Inventors: Muvaffak Amasya
Original assignee: Siskon Endustriyel Otomasyon Sistemleri Sanayi Ve Ticaret Anonim Sirketi
Priority date: 2021-10-22
Filing date: 2022-09-30
Publication date: 2023-04-27
Also published as: TR2021016527A2

Abstract

The invention is a face recognition system for identifying the said person in at least one image frame including the face of at least one person on a screen (400) that allows displaying visual content, comprising a user terminal (100) having an image capture unit (110) for capturing the image of the said screen (400). Accordingly, it comprises a processor unit that is associated to communicate with the said image capture unit (110), the said processor unit is configured to receive the first image captured by the said image capture unit as an input, detect at least one face region (411) in the said first image, detect at least one auxiliary information region (412) in the said first image, detect at least one sign and at least one text in the said auxiliary information region (412), create a filter containing at least one parameter value according to at least one of the detected sign and text, access a database storing the person information comprising at least one face image of the persons such that the said person information is associated with at least one of the said parameters, and perform a face recognition process based on the faces of the persons who fit the filter created for at least one of the faces in the said face regions (411).

Description

A FACE RECOGNITION SYSTEM TO IDENTIFY THE PERSON ON THE SCREEN

TECHNICAL FIELD

The invention relates to identification systems for identifying the said person in at least one image frame including the face of at least one person on a screen that allows displaying visual content, comprising a user terminal having an image capture unit for capturing the image of the said screen.

BACKGROUND

People can watch content such as series, movies, and sports competitions by playing live broadcasts or by playing recorded data. People may want to know who other people in this content such as actors, athletes, etc. are. For this purpose, the user can learn the identities of the individuals by researching electronic program guides and web pages containing the information of the monitored content.

US application numbered US2014068670 discloses a system and method in which additional information about actors in series or movie scenes can be accessed in a system with an optional content playback service. The existing additional information can be accessed through a user interface where the user can interact on the screen, and the pictures and detailed information of the actors can be accessed. However, for this, additional information must be associated with each video in advance. In addition, it is not possible to identify athletes in events such as live sports competitions with this method.

Application numbered US2009091629 discloses a system in which a screenshot is taken on the playback device and face recognition is applied to the face in this screenshot, and the matches in the database used in face recognition are presented to the user. However, when a scan is made in this way since a very wide scan will be made in the databases, system resources are used a lot and since the number of similar people may be high, the probability of obtaining wrong results increases. In addition, the device on which the content is monitored must also be specially configured, and it is not possible to identify people on screens that are not programmed to do this work. As a result, all the problems mentioned above have made it necessary to make an innovation in the relevant technical field.

BRIEF DESCRIPTION OF THE INVENTION

The present invention relates to a system to eliminate the above-mentioned disadvantages and bring new advantages to the relevant technical field.

An object of the invention is to provide a system and method to detect the people in the content displayed on the screen, such as the television screen, with less use of system resources and with increased accuracy.

Another object of the invention is to provide a system and method to obtain more accurate results close to advanced face recognition algorithms by using lesser face recognition algorithms.

Another object of the invention is to reduce the system resources used by existing face recognition algorithms and to increase their accuracy.

Another object of the invention is to provide a system to identify the people on the screen without requiring the special programming of the device with the screen.

To achieve all the objects mentioned above and that will emerge from the following detailed description, the present invention is a recognition system that includes a user terminal with an image capture unit for identifying the said person in at least one image frame containing the face of at least one person on a screen that enables the display of image content. Accordingly, it comprises a processor unit that is associated to communicate with the said image capture unit, and the said processor unit is configured to:

- receive a first image captured by the image capture unit as input,

- detect at least one face region in the said first image,

- detect at least one auxiliary information region in the said first image,

- detect at least one sign and at least one text in the said auxiliary information region,

- create a filter comprising at least one parameter value with respect to at least one of the detected signs and text, - access a database storing the person information comprising at least one face image of the persons such that the said person information is associated with at least one of the said parameters, and

- perform the face recognition process based on face images of persons who fit the filter created for at least one of the faces in the said face regions. Thus, by searching only in a person pool suitable for filters, the possibility of obtaining correct results is increased and system resources are used in a reduced way. In addition, accurate results can be obtained by using less complicated face recognition algorithms.

A possible embodiment of the invention is characterized in that the said database is configured to include person information indicating the time intervals at which persons are present in the said content or an organization related to the content, and the processor unit is configured to:

- detect that it has received a user entry from the user terminal that the content is broadcasted live, and

- add the current date and time information to the filter.

Another possible embodiment of the invention is characterized in that the said database is configured to include person information indicating the time intervals at which persons are present in the said content or an organization related to the content, and the processor unit is configured to:

- question whether one of the said texts expresses that the content is being broadcasted live; and

- add the current date and time information to the filter if it is determined that it expresses that it is broadcasted live. Thus, when it is determined from the texts or signs that the content belongs to a sports competition between the two teams, it is ensured that the filtration process is performed specifically for the people in the existing teams in this sports competition, and the system resources are used in a reduced way and a more accurate result is obtained.

Another possible embodiment of the invention is characterized in that if the processor unit detects that the detected text is a subtitle text, it is configured to:

- access a subtitle database where subtitles related to content are stored in relation to content names,

- query the detected text in the subtitle database, - detect the content names containing subtitles that match the detected text, and

- add the detected content name to the filter. Thus, since only the people in the content matching the subtitle are searched, the likelihood of the face recognition process obtaining correct results is increased and the system resources used in doing so are significantly reduced.

Another possible embodiment of the invention is characterized in that the processor unit is configured to display the user terminal of at least one person matched with the face in the face region on a user interface as a result of the face recognition process.

Another possible embodiment of the invention is characterized in that the processor unit is configured to request feedback from the said user interface on whether the matching person is correct or not and to edit the face recognition algorithm used in the face recognition process according to the feedback it receives.

Another possible embodiment of the invention is characterized in that the processor unit is configured to access a person detail information source containing additional information about the person and to display additional information about at least one of the matching persons in the user interface.

Another possible embodiment of the invention is characterized in that the processor unit comprises a face image of at least one person from the said additional information.

Another possible embodiment of the invention is characterized in that the processor unit is configured to display the matching results in the user interface to display the face image of the matching person and the first image on the same screen. Thus, the user can easily determine whether the match is correct or not.

Another possible embodiment of the invention is characterized in that the processor unit is configured to display the created filter in the user interface for the user to approve or edit and to perform face recognition processes if the user receives input from the user terminal regarding the approval.

Another possible embodiment of the invention is characterized in that the said sign is at least one of a broadcast channel symbol, a symbol for the content, a sports club jersey, and a sports club symbol. Another possible embodiment of the invention is characterized in that the said text is at least one of the subtitle, content name, content section information, text about whether the content is broadcasted live, text about the time the content is broadcasted, sports club name, sports club abbreviation, and competition score text.

Another possible embodiment of the invention is characterized in that the said screen and the user terminal are integrated, and the image capture unit has hardware configured to receive the screen image of the user terminal. Thus, identification can be performed by taking a screenshot on a screen such as a computer or smart TV.

BRIEF DESCRIPTION OF THE FIGURES

Figure 1 shows a representative view of the system.

Figure 2 shows a representative view of the first image.

Figure 3 shows a representative view of the system.

Figure 4 shows a representative view of the first image.

Figure 5 shows a schematic view of the system.

DETAILED DESCRIPTION OF THE INVENTION

In this detailed description, the subject matter of the invention is described by using examples only for a better understanding, which will have no limiting effect.

Referring to Figure 1 , the invention is a recognition system comprising a user terminal (100) having an image capture unit (1 10) for receiving an image of the said screen (400), for the identification of the said person in at least one image frame including the face of at least one person in a screen (400) enabling the display of video content. A processor unit (not shown in the figure) allows narrowing the database to be searched for face recognition using the text and signs in the image in addition to the face to identify the person matching a face in the images captured by the image capture unit (110), i.e. to apply an additional filter to the search. The face recognition system (10) comprises a user terminal (100) to enable image acquisition and display of the user; a server (200) that can communicate with the user terminal (100) through a communication network (300); a data source (251 ) that has a database that stores the person information containing at least one face image of the persons, to be associated with at least one of the said parameters, to which the said server (200) provides access to perform face recognition operations. In more detail, the face recognition system (10) may also include a person detail information source (252) containing the details of the persons of the server (200).

The said processor unit may be a terminal processor (120) of the user terminal (100). In a possible embodiment of the invention, the processor unit may be the server processor (210) of a server (200). In another embodiment of the invention, the processor unit may include a co-operating terminal processor (120) and server processor (210) to perform some of the steps of the invention.

(100)

The said user terminal (100) may include a communication unit (130) for communicating with the processor unit through the communication network (300). The communication network (300) may be an internet or similar network. The communication unit (130) may also be, for example, hardware that enables wireless connection to the internet network. The user terminal (100) may include a user interface to enable data to be presented to the user. The image capture unit (1 10) of the user terminal (100) may be a camera. The terminal processor unit (120) of the user terminal (100) may be associated with a memory unit (140), it may include software consisting of command lines registered in the said memory unit (140) and contributing to the operation of the invention. The user terminal (100) may be a smartphone, tablet computer, smartwatch, general-purpose computer, etc.

The screen (400) of this description is a device for displaying content. It may be a monitor, television, projector, etc. The screen (400) may play content received from a content source or in a memory unit (140). The screen (400) may receive broadcast content such as, for example, satellite broadcast, terrestrial broadcast, or cable broadcast; it may be able to receive content from an optional video source, a memory. The content may be a video or just an image frame. It may be content, movies, series, sports competitions, etc.

The processor unit receives as input the first image captured by the image capture unit (1 10) as the characteristic aspect of the invention. Then, it detects at least one face region (411 ) in the said first image. The face region (41 1 ) herein refers to the portion where the faces of the persons in the first image are detected. In the first image, there may be one or more people. The processor unit then detects at least one auxiliary information region (412) in the said first image. The auxiliary information region (412) may be in the body part of the detected face, in the corners of the screen (400), and the region close to the lower edge of the screen (400). Auxiliary information regions are the regions where the channel name, sports competition score, subtitle, athlete jersey, content name (series name, movie name, etc.) information is located. The representative view of the face region (41 1 ) and the auxiliary regions in the first image is given in Figure 2.

In a possible embodiment of the invention, the processor unit may access the electronic program guide (EPG) of the channel when the channel name is detected and may determine the name of the content accordingly. It also updates the filter by the name of the content.

The processor unit detects at least one sign and at least one text in the auxiliary information region (412). The processor unit then creates a filter containing at least one parameter value with respect to at least one of the detected signs and text. The processor unit determines the sequence name as a parameter, for example when it detects a sequence name in the text. The processor unit then accesses the aforementioned database and performs face recognition based on facial images of persons who fit the filter created for at least one of the faces in the said face regions (41 1 ). In the case where the series name is a parameter, face recognition is performed using only the images of people indexed as players in this series. Thus, system resources are used in a reduced way and faster and more accurate results can be obtained.

In a possible embodiment of the invention, the database may also include the date and time ranges in which persons are included in the contents or organizations associated with the content. For example, the date range in which a person is in a series or the date range in which an athlete is in a sports club can also be associated with people. Accordingly, in a possible embodiment of the invention, the processor unit is configured to add the current date and time information to the filter if it detects that it receives user input from the user terminal (100) that the content is live streamed. In another possible embodiment of the invention, the processor unit questions whether one of the said texts expresses that the content is broadcasted live or not, and if it determines that it expresses that it is broadcasted live, it adds the current date and time information to the filtering filter. This text can be the expression "live" for example. Thus, by updating the filter according to instantaneous date and time, for example, when it is desired to recognize a person in a football match, a search is made based on the current staff and employees of football clubs. From other auxiliary information, as shown in Figure 3 or Figure 4, club abbreviations can be detected and scanned according to the tool name parameter, for example, determined according to these abbreviations.

In another possible embodiment of the invention, weighted colors or field color on the screen (400) can be determined as a sign and added to the sports type filter according to the field color and scanning can be done by the staff of the sports club in this sports branch.

The processor unit is configured to request feedback on whether the matching person is correct from the said user interface and to edit the face recognition algorithm used in the face recognition process according to the feedback it receives.

In a possible embodiment, the face recognition system (10) includes a subtitle database. The subtitle database contains subtitles in various languages associated with content names. If the processor unit detects that a text in the auxiliary information region (412) is a subtitle, it queries that subtitle in the subtitle database (253). As a result of the query, it adds the content names containing the subtitle as parameters to the filter. For example, when a caption is detected in the auxiliary information region (412) and it is determined that this caption is “Legolas! What do your Elf eyes see?”, the contents containing this caption are determined from the caption database. For example, in this case, one of the contents containing this subtitle can be determined as "Lord of the Rings: Two T owers". The processor unit filter is arranged to include the parameter "Lord of the Rings: Two Towers". The processor unit then determines the matches by running the face recognition algorithm based on the players of the content with the content name "Lord of the Rings: Two Towers" from the data source (251 ). In this case, for example, as a result, it can be ensured that the person's name "Viggo Mortensen" is displayed in the user interface. The processor unit may also display the data received from the person detail information source (252) in the user interface. This information may include, for example, the date and place of birth of the identified person, movies they have played, etc. Thus, by narrowing the pool of people with face recognition, both the probability of the results being correct and the system resources are used in a reduced way.

The said face recognition algorithms are one of the face recognition algorithms known in the art.

In a possible embodiment of the invention, it is the display of the person detected by face recognition in the user interface in a way that the first image is displayed at the same time as a face image. Thus, the user can easily observe whether the match is correct or not. In a possible embodiment of the invention, these images can be provided side by side. In a possible embodiment of the invention, it can be ensured that these images are displayed overlapping by changing the transparency ratio.

In a possible embodiment of the invention, it is requested to make an entry from the user interface as to whether the matched person is the right person. The processor unit can update the face recognition algorithm or edit the filter according to the input it receives from the user terminal (100).

In a possible embodiment of the invention, the screen (400) and the user terminal (100) are integrated. This embodiment may be provided in a smart television. The image capture unit (110) captures the screen (400) image. In this embodiment, when the screen (400) and the user terminal (100) are integrated, they may be a computer, a smart television, or a tablet computer.

Text detection and text reading processes in the auxiliary information region (412) can be performed with optical character recognition algorithms.

The scope of protection of the invention is specified in the attached claims and cannot be limited to those explained for exemplary purposes in this detailed description. It is evident that a person skilled in the art may exhibit similar embodiments in light of the above-mentioned facts without drifting apart from the main theme of the invention.

REFERENCE NUMBERS GIVEN IN THE FIGURE

10 Face recognition system

100 User terminal

110 Image capture unit

120 Terminal processor

130 Communication unit

140 Memory unit

200 Server

210 Server processor

251 Data source

252 Person detail information source

253 Subtitle database

300 Communication network

400 Screen

410 First image frame

411 Face region

412 Auxiliary information region

Claims

CLAIMS A face recognition system (10) for identifying the said person in at least one image frame including the face of at least one person on a screen (400) that allows displaying visual content, comprising a user terminal (100) having an image capture unit (1 10) for capturing the image of the said screen (400), characterized in that it comprises a processor unit associated to communicate with the said image capture unit (1 10), and the processor unit is configured to:

- receive a first image captured by the image capture unit as input,

- detect at least one face region (41 1 ) in the said first image,

- detect at least one auxiliary information region (412) in the said first image,

- detect at least one of at least one sign and at least one text in the said auxiliary information region (412),

- form a filter comprising at least one parameter value with respect to at least one of the detected signs and text,

- access a database storing the person information comprising at least one face image of the persons such that the said person information is associated with at least one of the said parameters, and

- perform the face recognition process based on face images of persons who fit the filter created for at least one of the faces in the said face regions (411 ). A face recognition system (10) according to claim 1 , characterized in that the said database is configured to include the person information indicating the time intervals when the persons are in the said content or an organization related to the content, and the processor unit is configured to

- detect that it has received a user entry from the user terminal that the content is being broadcasted live, and

- add the current date and time information to the filter. A face recognition system (10) according to claim 1 , characterized in that the said database is configured to include the person information indicating the time intervals when the persons are in the said content or an organization related to the content, and the processor unit is configured to, - question whether one of the said texts expresses that the content is being broadcasted live; and

- add the current date and time information to the filter if it is determined that it expresses that it is broadcasted live.

4. A face recognition system (10) according to claim 1 , characterized in that if the processor unit detects that the detected text is a subtitle text, it is configured to

- access a subtitle database (253) where subtitles related to content are stored in relation to content names,

- query the detected text in the subtitle database (253),

- detect the content names containing subtitles that match the detected text, and

- add the detected content name to the filter.

5. A face recognition system (10) according to claim 1 , characterized in that the processor unit is configured to display the user terminal (100) of at least one person matched to the face in the face region (41 1 ) in the user interface as a result of the face recognition process.

6. A face recognition system (10) according to claim 5, characterized in that the processor unit is configured to request feedback on whether the matching person is correct from the said user interface and to edit the face recognition algorithm used in the face recognition process according to the feedback it receives.

7. A face recognition system (10) according to claim 5, characterized in that the processor unit is configured to access a person detail information source (252) containing additional information about the person and to display additional information about at least one of the matching persons in the user interface.

8. A face recognition system (10) according to claim 7, characterized in that the processor unit comprises a face image of at least one person from the said additional information.

9. A face recognition system (10) according to claim 8, characterized in that the processor unit is configured to display the matching results on the user interface to display the face image of the matching person and the first image on the same screen (400). A face recognition system (10) according to claim 1 , characterized in that the processor unit is configured to display the created filter in the user interface for the user to approve or edit, and to perform face recognition processes in case of receiving an input related to the approval from the user terminal (100). A face recognition system (10) according to claim 1 , characterized in that the said sign is at least one of a broadcast channel symbol, a symbol related to the content, a sports club jersey, and a sports club symbol. A face recognition system (10) according to claim 1 , characterized in that the said text is at least one of the subtitle, content name, content section information, text on whether the content is broadcasted live, text on the time the content is broadcasted, sports club name, sports club abbreviation, and competition score text. A face recognition system (10) according to claim 1 , characterized in that the said processor unit is a terminal processor (120). A face recognition system (10) according to claim 1 , characterized in that the said processor unit is a server processor (210). A face recognition system (10) according to claim 1 , characterized in that the said screen (400) and the user terminal (100) are integrated, and the image capture unit (110) has hardware configured to receive the screen image of the user terminal (100).