CN112560783A

CN112560783A - Methods, apparatus, systems, media and products for assessing a state of interest

Info

Publication number: CN112560783A
Application number: CN202011572400.1A
Authority: CN
Inventors: 王志; 丘宇翔; 王正宇; 刘威畅
Original assignee: JD Digital Technology Holdings Co Ltd
Current assignee: JD Digital Technology Holdings Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-03-26

Abstract

The present disclosure provides a method for evaluating an attention state, applied to an advertisement screen terminal, including: acquiring face image information of an object to be evaluated; the object to be evaluated is an object collected in a preset collection area associated with the advertising screen terminal; generating a face information feature vector based on the face image information; inputting the face information feature vector to an attention identification model to obtain an output result of the attention identification model; the concerned identification model is obtained by training according to the sample face information characteristic vector and is used for identifying whether an object concerns the advertising screen terminal; determining an attention state evaluation result of the object to be evaluated according to the output result of the attention recognition model; wherein the attention state evaluation result comprises attention and non-attention. An apparatus, computer system, medium, and computer program product for assessing a state of interest are also provided.

Description

Methods, apparatus, systems, media and products for assessing a state of interest

Technical Field

The present disclosure relates to the field of computer technology and internet technology, and more particularly, to a method, an apparatus, a computer system, a computer-readable storage medium, and a computer program product for evaluating a state of interest.

Background

With the rapid development of artificial intelligence, automatic control, communication and computer technologies, many enterprises with different properties can construct various channels to put advertisements, such as building advertisements, subway advertisements, elevator advertisements and the like, on line; in most cases, the work of monitoring and measuring the off-line advertisement putting effect is well done, so that the putting accuracy and timeliness can be considered in the digital era.

In the process of realizing the concept of the present disclosure, the inventor finds that in the related art, at least the following problem exists, and the advertisement screen attention statistical method is used as an evaluation method for monitoring and measuring the off-line advertisement delivery effect, and the recognition accuracy is low because evaluation is performed based on the posture or the stay time.

Disclosure of Invention

In view of the above, the present disclosure provides a method, an apparatus, a computer system, a computer-readable storage medium, and a computer program product for evaluating a state of interest.

One aspect of the present disclosure provides a method for evaluating an attention state, applied to an advertisement screen terminal, including:

acquiring face image information of an object to be evaluated; the object to be evaluated is an object collected in a preset collection area associated with the advertising screen terminal;

generating a face information characteristic vector based on the face image information;

inputting the face information feature vector into an attention identification model to obtain an output result of the attention identification model; the concerned identification model is obtained by training according to the sample face information characteristic vector and is used for identifying whether an object concerns the advertising screen terminal; and

determining an attention state evaluation result of the object to be evaluated according to an output result of the attention recognition model; wherein, the attention state evaluation result comprises attention and non-attention.

According to an embodiment of the present disclosure, the above-mentioned attention recognition model includes a convolutional neural network;

the convolutional neural network comprises a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer and a full-connection layer which are sequentially stacked.

According to an embodiment of the present disclosure, the determining, according to the output result of the attention recognition model, the attention state evaluation result of the object to be evaluated includes:

comparing the output result of the concerned identification model with a first preset threshold value; wherein, the output result of the attention identification model comprises the attention probability of the object to be evaluated;

if the attention probability is larger than the first preset threshold, determining the attention state evaluation result of the object to be evaluated as attention; and

and if the attention probability is smaller than or equal to the first preset threshold, determining that the attention state evaluation result of the object to be evaluated is not attention.

According to an embodiment of the present disclosure, after determining an attention state evaluation result of the object to be evaluated according to an output result of the attention recognition model, the method further includes:

determining a first number of objects acquired in a preset acquisition area within a preset time period;

determining a second number of objects of interest as the attention state evaluation result based on the attention state evaluation result of each of the objects; and

and determining the attention of the advertising screen terminal based on the second number and the first number.

According to the embodiment of the disclosure, the attention recognition model is obtained by training according to the sample face information feature vector by using a loss function, and the training process includes:

acquiring the face image information of a sample object with a known attention state result;

generating a face information feature vector of the sample object based on the face image information of the sample object;

inputting the face information characteristic vector of the sample object into an initial convolutional neural network to generate loss of a loss function; wherein, the parameters of the initial convolutional neural network are preset parameters; and

and training the initial convolutional neural network according to the loss of the loss function until the loss of the loss function is minimum, and taking a model corresponding to the loss function when the loss of the loss function is minimum as the attention recognition model.

According to an embodiment of the present disclosure, after determining an attention state evaluation result of the object to be evaluated according to the output result of the attention recognition model, the method further includes:

inputting the face information characteristic vector corresponding to the face image information of the object determined as the attention object into the attention identification model to generate the loss of the loss function; and

and performing optimization training on the attention recognition model according to the loss of the loss function until the loss of the loss function is minimum, and taking the model corresponding to the minimum loss of the loss function as the updated attention recognition model.

comparing the output result of the concerned identification model with a second preset threshold value; wherein the second preset threshold is greater than the first preset threshold; and

and if the attention probability is greater than the second preset threshold, determining that the object to be evaluated is the object determined to be the attention.

According to an embodiment of the present disclosure, the above-mentioned loss function includes a combination of a classification loss softmax function and a center loss center function.

According to an embodiment of the present disclosure, the above loss function is as follows:

wherein, C_yiClass center features representing the class yi to which they belong; x is the number of_iFeatures representing fully connected layer inputs; m represents the size of the minimum training sample; n represents the number of categories; λ is the weight of the loss function L; w_yi、W_jRespectively representing the yi th column and the j th column of the weight matrix W of the full connection layer; b represents a bias term.

According to an embodiment of the present disclosure, the generating a face information feature vector based on the face image information includes:

extracting facial region characteristics of the facial image information to obtain facial region image information; and

and carrying out vector conversion on the image information of the face region to generate a face information characteristic vector.

Another aspect of the present disclosure provides an apparatus for evaluating a state of interest, comprising:

the acquisition module is used for acquiring the face image information of the object to be evaluated; the object to be evaluated is an object collected in a preset collection area associated with the advertising screen terminal;

the generating module is used for generating a face information characteristic vector based on the face image information;

the output module is used for inputting the face information feature vector to an attention identification model to obtain an output result of the attention identification model; the concerned identification model is obtained by training according to the sample face information characteristic vector and is used for identifying whether an object concerns the advertising screen terminal; and

a determination module, configured to determine an attention state evaluation result of the object to be evaluated according to an output result of the attention recognition model; wherein, the attention state evaluation result comprises attention and non-attention.

According to an embodiment of the present disclosure, the above apparatus for evaluating a state of interest further includes:

the training module is used for acquiring the face image information of the sample object with the known attention state result; generating a face information feature vector of the sample object based on the face image information of the sample object; inputting the face information characteristic vector of the sample object into an initial convolutional neural network to generate loss of a loss function; wherein, the parameters of the initial convolutional neural network are preset parameters; and training the initial convolutional neural network according to the loss of the loss function until the loss of the loss function is minimum, and taking a model corresponding to the loss function when the loss of the loss function is minimum as the attention recognition model.

Another aspect of the present disclosure provides a computer system comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.

Another aspect of the disclosure provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the method as described above.

Another aspect of the disclosure provides a computer program product comprising:

computer executable instructions which when executed are for implementing the method as described above.

According to the embodiment of the present disclosure, since the method for evaluating the attention state is adopted, the method is applied to an advertisement screen terminal, and includes: acquiring face image information of an object to be evaluated; the method comprises the following steps that an object to be evaluated is acquired in a preset acquisition area associated with an advertising screen terminal; generating a face information feature vector based on the face image information; inputting the face information feature vector into an attention identification model to obtain an output result of the attention identification model; the concerned identification model is obtained by training according to the sample face information characteristic vector and is used for identifying whether an object concerns the advertising screen terminal; determining an attention state evaluation result of an object to be evaluated according to an output result of the attention recognition model; the attention state evaluation result comprises attention and non-attention technical means, so that the technical problem that the recognition accuracy is low due to evaluation based on the posture or the stay time is at least partially overcome, and the technical effect of accurately recognizing and determining whether the object pays attention to the advertising screen is achieved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario of a method and apparatus for evaluating a state of interest according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates an exemplary system architecture to which the disclosed method and apparatus for evaluating a state of interest may be applied;

FIG. 3 schematically illustrates a flow chart of a method for evaluating a state of interest according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a training process for a recognition model of interest according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates an optimization training process for a recognition model of interest according to an embodiment of the present disclosure;

FIG. 6 schematically shows a flow diagram of a method for evaluating a state of interest according to another embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of an apparatus for evaluating a state of interest, in accordance with an embodiment of the present disclosure; and

FIG. 8 schematically illustrates a block diagram of a computer system 800 suitable for implementing a method for evaluating a state of interest, in accordance with an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Embodiments of the present disclosure provide a method and apparatus for evaluating a state of interest. The method is applied to an advertising screen terminal and comprises the steps of obtaining face image information of an object to be evaluated; the method comprises the following steps that an object to be evaluated is acquired in a preset acquisition area associated with an advertising screen terminal; generating a face information feature vector based on the face image information; inputting the face information feature vector into an attention identification model to obtain an output result of the attention identification model; the concerned identification model is obtained by training according to the sample face information characteristic vector and is used for identifying whether an object concerns the advertising screen terminal; determining an attention state evaluation result of an object to be evaluated according to an output result of the attention recognition model; wherein the attention state evaluation result comprises attention and non-attention.

Fig. 1 schematically shows an application scenario diagram according to an embodiment of the present disclosure.

As shown in fig. 1, the method for evaluating the attention state is applied to an advertisement screen terminal 100, and the advertisement screen terminal 100 has a display thereon and can be used for playing an advertisement; the advertisement screen terminal 100 is further provided with a camera shooting and collecting device for collecting face image information. The application scenario of the method for evaluating the attention state provided by the embodiment of the present disclosure may be that a display on the advertisement screen terminal 100 plays an advertisement, a camera capture device on the advertisement screen terminal 100 captures face image information of a pedestrian appearing in a preset capture area, and the pedestrian is identified by an attention identification model stored on the advertisement screen terminal 100, so as to finally determine whether the pedestrian pays attention to the advertisement displayed on the advertisement screen terminal.

According to the embodiment of the disclosure, the main purpose is to identify and count whether pedestrians pay attention to advertisements displayed by the advertisement screen terminal, so as to evaluate the attention receiving effect of offline advertisement delivery, provide effective feedback data for advertisement delivery parties, further modify the advertisement delivery strategy by the advertisement delivery parties, and simultaneously prevent the advertisement delivery parties from cheating the advertisement delivery cost by idle advertisement delivery under the unmanned condition.

Fig. 2 schematically illustrates an exemplary system architecture 200 to which the methods and apparatus for evaluating a state of interest may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 2 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 2, a system architecture 200 according to this embodiment may include an advertising screen terminal 201, a network 202, and a server 203. Network 202 serves as a medium for providing a communication link between the billboard terminal 201 and server 203. Network 202 may include various connection types, such as wired and/or wireless communication links, and so forth.

A user may use the advertising screen terminal 201 to interact with the server 203 via the network 202 to receive or send messages or the like. The advertisement screen terminal 201 can be installed with displays such as an advertisement screen and an application for controlling the playing of the advertisement screen; a camera shooting acquisition device can be further arranged for acquiring object information; in addition, an attention identification model for identifying whether the object pays attention to the advertisement displayed on the advertisement screen can be stored.

The advertisement screen terminal 201 may be various electronic devices having a display screen and supporting an image analysis process.

The server 203 may be a server providing various services, such as a background management server (for example only) providing storage support for video information collected by a user using the advertisement screen terminal 201. The background management server can analyze, store and the like the received data such as the video information, can perform optimization training on the attention recognition model according to the video information, and feeds back the updated attention recognition model to the advertisement screen terminal 201.

It should be noted that the method for evaluating the attention state provided by the embodiment of the present disclosure may be executed by the advertisement screen terminal 201, or may also be executed by other terminal devices different from the advertisement screen terminal 201. Accordingly, the apparatus for evaluating the attention state provided by the embodiment of the present disclosure may also be disposed in the advertisement screen terminal 201, or in other terminal devices different from the advertisement screen terminal 201.

For example, the attention recognition model and the acquired face image information may be originally stored in the advertising screen terminal 201, or stored on an external storage device and may be imported into the advertising screen terminal 201. Then, the advertisement screen terminal 201 may locally perform the method for evaluating the attention state provided by the embodiment of the present disclosure, or send the attention recognition model and the collected face image information to other terminal devices, servers, or server clusters, and perform the method for evaluating the attention state provided by the embodiment of the present disclosure by the other terminal devices, servers, or server clusters receiving the image to be processed.

It should be understood that the number of billboard terminals, networks and servers in fig. 2 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

FIG. 3 schematically shows a flow chart of a method for evaluating a state of interest according to an embodiment of the present disclosure.

As shown in fig. 3, the method includes operations S301 to S304.

In operation S301, face image information of an object to be evaluated is acquired; the object to be evaluated is an object collected in a preset collecting area associated with the advertising screen terminal.

According to the embodiment of the disclosure, the face image information is obtained by shooting pictures through the camera shooting and collecting device; the video can be recorded by the camera shooting and collecting device and then obtained by extracting the video frame picture. The preset acquisition area associated with the advertising screen terminal can be a video shooting area which can be acquired by a camera shooting acquisition device installed on the advertising screen terminal.

In operation S302, a face information feature vector is generated based on the face image information.

In the process of implementing the present disclosure, it is found that several following attention detection modes may be adopted, for example, the recognition is performed based on the image information of the face key points, but this technology is limited by the recognition of the face key point information, and there is a problem that the face key points are blocked, and if the face has a mask, such as a mask, the face key point information cannot be obtained, and further, the face cannot be recognized accurately, and an adverse consequence of erroneous judgment is easily caused.

In addition, the human eye spirit or body posture is used for recognition, such as the deflection angle of the human body, the Euler angle of the eyeball cone and the like, but the uncertainty exists between the determination of the angle and the distance between the human body and the playing device, and the means is difficult to quantify.

In addition, a mobile phone mac address carried by a user around is identified by a mobile phone WIFI probe to carry out people flow statistics, but at present, a user mobile phone system hides the identity of the user through a virtual mac, and the mode is gradually eliminated.

According to the embodiment of the disclosure, the face information feature vector is used as a research basis of the advertising screen terminal attention state evaluation method. The whole face image information of the object is used as reference, so that the problem of false recognition or no recognition caused by shielding of key points of the face is avoided; and the deflection angle of the eyeball centrum or the body posture does not need to be calculated, and the problem of misjudgment caused by difficult quantization is also avoided.

According to the embodiment of the disclosure, the judgment is carried out based on the whole face image information, and the recognition accuracy and the recognition speed are both improved.

According to other embodiments of the present disclosure, generating the face information feature vector based on the face image information may include performing face region feature extraction on the face image information to obtain face region image information; and carrying out vector conversion on the image information of the face region to generate a face information characteristic vector.

According to other embodiments of the present disclosure, the operation of generating the face information feature vector based on the face image information may be implemented by a face feature extraction model. The operation may specifically be inputting the face image information into the face feature extraction model to obtain an output result, where the output result is a face information feature vector.

According to the embodiment of the present disclosure, the face feature extraction model may be constructed based on a convolutional neural network, but is not limited thereto, and other models for extracting face features in the prior art may also be adopted. As long as the whole facial region feature extraction can be realized and the facial region image information of the human face is converted into the facial information feature vector.

In operation S303, the face information feature vector is input to the attention recognition model to obtain an output result of the attention recognition model; the concerned identification model is obtained by training according to the sample face information characteristic vector and is used for identifying whether an object concerns the advertising screen terminal.

According to the embodiment of the disclosure, on the basis of generating the sample face information feature vector based on the sample face image information, the convolutional neural network is trained, and finally, an attention identification model for identifying whether an object pays attention to the advertising screen terminal is obtained.

In operation S304, determining an attention state evaluation result of the object to be evaluated according to an output result of the attention recognition model; wherein the attention state evaluation result comprises attention and non-attention.

According to the embodiment of the disclosure, the abstract problem of whether the pedestrian (i.e. the object to be evaluated) pays attention to the advertisement can be defined as a two-classification problem, i.e. paying attention to and not paying attention to, so that the judgment is convenient, quick and effective.

According to the embodiment of the disclosure, on one hand, the problem of judging the concerned advertisement is abstracted and simplified, on the other hand, the whole face image information is combined with the identification basis, and the concerned identification model is adopted for identification, so that the effects of high identification rate and high accuracy are realized.

The method shown in fig. 3 is further described with reference to fig. 4-6 in conjunction with specific embodiments.

FIG. 4 schematically illustrates a training process for a recognition model of interest according to an embodiment of the present disclosure.

As shown in fig. 4, the method includes operations S401 to S404.

In operation S401, face image information of a sample object for which a state of interest result is known is acquired.

According to the embodiment of the present disclosure, the face image information of the sample object with the known attention state result may use an open source data set, but is not limited thereto, and may also use an open source data set, and combine an acquired data set obtained from an offline scene, for example, an acquired data set obtained by performing video acquisition through a camera acquisition device of an advertisement screen terminal and subjected to desensitization.

According to the embodiment of the disclosure, the open source data set and the collected data set are combined to serve as the training samples, the training samples are various in types, and the possibility of obtaining the concerned recognition model with high recognition accuracy is improved.

In operation S402, a face information feature vector of a sample object is generated based on face image information of the sample object.

In operation S403, the face information feature vector of the sample object is input to the initial convolutional neural network, so as to generate a loss of the loss function; and the parameters of the initial convolutional neural network are preset parameters.

In operation S404, the initial convolutional neural network is trained according to the loss of the loss function until the loss of the loss function is minimum, and a model corresponding to the loss of the loss function when the loss of the loss function is minimum is used as the identification model of interest.

According to the embodiment of the disclosure, the adopted attention identification model can be a convolutional neural network structure, and the attention identification model is obtained by continuously training the initial convolutional neural network until the loss of the loss function is minimum.

According to the embodiment of the disclosure, the optimal parameters of the attention identification model and the corresponding convolutional neural network structure can be stored in the model file through a serialization technology, and the method is used for the advertising screen terminal to be separated from the back-end server and directly utilize the attention identification model locally to implement the method for evaluating the attention state.

According to the embodiment of the disclosure, the attention identification model is directly stored in the advertising screen terminal, an attention identification model which can be directly processed and used by the advertising screen terminal is formed, and a new processing mode is provided. The method is different from other methods which depend on the operation of the server and need the advertisement screen terminal to transmit the image information back to the server for processing. By adopting the method for evaluating the attention state of the embodiment of the disclosure, the method for processing the face image information locally and identifying through the attention identification model is provided, and the operation and use cost is reduced on the basis of reducing the data transmission flow.

According to an embodiment of the present disclosure, the recognition model of interest includes a convolutional neural network. The convolutional neural network may include, but is not limited to, a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, and a full-link layer, which are sequentially stacked.

The structure of the convolutional neural network according to an embodiment of the present disclosure is as follows table 1.

TABLE 1

Input device	Size of
		Convolutional layer 1	227×227×3
Pooling layer 1	55×55×96
		Convolutional layer 2	27×27×96
Pooling layer 2	27×27×256
		Convolutional layer 3	13×13×256
Convolutional layer 4	13×13×284
		Convolutional layer 5	6×6×256

According to an embodiment of the present disclosure, the attention recognition model gives probability cases for each classification by mapping a plurality of neuron outputs calculated by the convolutional neural network to a (0, 1) interval.

According to an embodiment of the present disclosure, the loss function includes a combination of a classification loss softmax function and a center loss center function. Wherein the loss function L of the identification model of interest may be the following formula (1):

L＝L_s+λL_C；

wherein, C_yiClass center features representing the class yi to which they belong; x is the number of_iFeatures representing fully connected layer inputs; m represents the size of the mini-batch; n represents the number of categories; λ is the weight of the loss function L; w_yi、W_jRespectively representing the yi th column and the j th column of the weight matrix W of the full connection layer; b represents a bias term.

According to the embodiment of the disclosure, the attention identification model structure shown in table 1 is designed, and the loss function L of the attention identification model in the above formula (1) is combined, compared with other experiments, the training precision can reach 88% -93%, the training precision is high under the same iteration times, and the advantages of small memory occupation, fast prediction time and the like can be realized.

FIG. 5 schematically illustrates an optimization training process for a recognition model of interest according to an embodiment of the present disclosure.

As shown in fig. 5, the method includes operations S501 to S502.

In operation S501, a face information feature vector corresponding to face image information of an object determined to be a target of interest is input to the identification model of interest, and a loss of a loss function is generated; and

in operation S502, an attention recognition model is optimally trained according to the loss of the loss function until the loss of the loss function is minimum, and a model corresponding to the loss of the loss function when the loss of the loss function is minimum is used as an updated attention recognition model.

According to the embodiment of the disclosure, the attention recognition model can be optimally trained by using the collected data in practical application as a new training sample so as to obtain an updated attention recognition model. For example, during an epidemic situation, a pedestrian leaves with a mask, and wearing the mask causes a large degree of change in facial information of the person. The embodiment of the disclosure is based on the change of actual conditions, the optimization training of the attention recognition model is designed, the recognized sample with higher probability is collected and added into the training data, and the recognition accuracy of the attention recognition model can be improved through multiple iterations.

According to the embodiment of the disclosure, the attention identification model considers the requirements of effectiveness and applicability, and the actual application effect of the attention identification model is improved.

FIG. 6 schematically shows a flow chart of a method for evaluating a state of interest according to another embodiment of the present disclosure.

As shown in fig. 6, determining the attention state evaluation result of the object to be evaluated based on the output result of the attention recognition model includes operations S610, S621, and S622.

In operation S610, comparing the output result of the attention recognition model with a first preset threshold value; wherein, the output result of the attention identification model comprises the attention probability of the object to be evaluated.

In operation S621, if the attention probability is greater than a first preset threshold, it is determined that the attention state evaluation result of the object to be evaluated is attention.

In operation S622, if the attention probability is less than or equal to a first preset threshold, it is determined that the attention state evaluation result of the object to be evaluated is not-attention.

According to the embodiment of the disclosure, the output result of the attention identification model includes the class probability of each classification, and in the embodiment of the disclosure, the output result of the attention identification model includes not only the attention probability of the object to be evaluated but also the non-attention probability of the object to be evaluated, and the sum of the two probabilities is 1.

According to the embodiment of the disclosure, the attention probability of the object to be evaluated of the attention recognition model is used as the comparison value to be compared with the first preset threshold value, and the comparison and analysis are simple, convenient and quick.

The first preset threshold may be selected from the probabilities of 80%, 85%, and the like, but is not limited to this, and may be actually formulated according to the actual situation or the requirement of the recognition accuracy. Whether attention is paid or not is concluded based on the comparison of the attention probability with a first preset threshold, wherein the higher the attention probability output by the attention recognition model is, the higher the confidence of the attention recognition model is.

In view of this, according to another embodiment of the present disclosure, in the process of optimally training the attention recognition model by using the collected data in the actual application as the new training sample, the object with the high attention probability output by the attention recognition model may be used as the training sample to perform the optimal training of the attention recognition model.

For example, the output result of the attention recognition model may be compared with a second preset threshold value; the second preset threshold is larger than the first preset threshold; and if the attention probability is larger than a second preset threshold value, determining that the object to be evaluated is the object determined to be the attention.

According to other embodiments of the disclosure, an object with an attention probability greater than a second preset threshold is used as an object determined as an attention object to be used as a training sample to perform optimization training on an attention recognition model, so that not only are requirements on effectiveness and applicability considered, but also the effect of improving the recognition accuracy of the attention recognition model by improving the training sample is achieved.

According to an embodiment of the present disclosure, after determining an attention state evaluation result of an object to be evaluated according to an output result of an attention recognition model, the method of the present disclosure may further include the following operations.

determining the attention state evaluation result as a second number of the objects of attention based on the attention state evaluation result of each object; and

based on the second number and the first number, the attention of the advertising screen terminal is determined.

According to the embodiment of the disclosure, the method for evaluating the attention state can be used for evaluating whether a single object pays attention to the advertisement displayed on the advertisement screen terminal; the occupancy rate of an object concerning an advertisement displayed by an advertisement screen terminal among a plurality of objects collected in a preset collection area over a period of time may be also evaluated. For example, the attention of a certain advertisement displayed on the advertisement screen terminal is determined by the first number of all the objects collected and the second number of the objects paying attention to the advertisement. More specifically, the attention of the advertisement screen terminal is a ratio of the second number to the first number.

Fig. 7 schematically shows a block diagram of an apparatus for evaluating a state of interest according to an embodiment of the present disclosure.

As shown in fig. 7, the apparatus 700 for evaluating a state of interest includes an acquisition module 710, a generation module 720, an output module 730, and a determination module 740.

An obtaining module 710, configured to obtain face image information of an object to be evaluated; the object to be evaluated is an object collected in a preset collecting area associated with the advertising screen terminal.

And a generating module 720, configured to generate a face information feature vector based on the face image information.

The output module 730 is used for inputting the face information feature vector to the attention identification model to obtain an output result of the attention identification model; the concerned identification model is obtained by training according to the sample face information characteristic vector and is used for identifying whether an object concerns the advertising screen terminal.

The determining module 740 is configured to determine an attention state evaluation result of the object to be evaluated according to an output result of the attention recognition model; wherein the attention state evaluation result comprises attention and non-attention.

According to an embodiment of the present disclosure, the generation module 720 includes an extraction unit and a generation unit.

And the extraction unit is used for extracting the facial region characteristics of the facial image information to obtain the facial region image information of the human face.

And the generating unit is used for carrying out vector conversion on the image information of the face area to generate a face information characteristic vector.

According to an embodiment of the present disclosure, the determining module 740 includes a first comparing unit, configured to compare the output result of the attention recognition model with a first preset threshold; wherein, the output result of the attention identification model comprises the attention probability of the object to be evaluated.

According to the embodiment of the disclosure, if the attention probability is greater than a first preset threshold, the attention state evaluation result of the object to be evaluated is determined as attention.

According to the embodiment of the disclosure, if the attention probability is less than or equal to the first preset threshold, the attention state evaluation result of the object to be evaluated is determined to be not-attention.

According to an embodiment of the present disclosure, the determining module 740 includes a second comparing unit, configured to compare the output result of the attention recognition model with a second preset threshold; the second preset threshold is larger than the first preset threshold; and if the attention probability is larger than a second preset threshold value, determining that the object to be evaluated is the object determined to be the attention.

According to an embodiment of the present disclosure, the apparatus for evaluating an attention state further includes an attention determination module configured to determine a first number of objects acquired in a preset acquisition region within a preset time period; determining the attention state evaluation result as a second number of the objects of attention based on the attention state evaluation result of each object; and determining the attention of the advertisement screen terminal based on the second number and the first number.

According to the embodiment of the disclosure, the device for evaluating the attention state further comprises a training module, wherein the training module is used for training according to the sample face information feature vector by using the loss function to obtain the attention recognition model.

According to the embodiment of the disclosure, the training module is used for acquiring the face image information of the sample object with known attention state result; generating a face information characteristic vector of the sample object based on the face image information of the sample object; inputting the face information characteristic vector of the sample object into an initial convolutional neural network to generate loss of a loss function; wherein, the parameters of the initial convolutional neural network are preset parameters; and training the initial convolutional neural network according to the loss of the loss function until the loss of the loss function is minimum, and taking a model corresponding to the loss function when the loss of the loss function is minimum as a concerned recognition model.

According to an embodiment of the present disclosure, the loss function includes a combination of a classification loss softmax function and a center loss center function.

According to an embodiment of the present disclosure, the loss function L of the recognition model of interest may be the following equation:

L＝L_s+λL_C；

wherein, C_yiClass center features representing the class yi to which they belong; x is the number of_iFeatures representing fully connected layer inputs; m represents the size of the mini-batch; n represents the number of categories; λ is the weight of the loss function L; w_yi、W_jRespectively representing the yi th column and the j th column of the weight matrix W of the full connection layer; b represents a bias term. According to an embodiment of the present disclosure, the apparatus for evaluating an attention state further includes an optimization training module, configured to input a face information feature vector corresponding to face image information of an object determined as an attention to the attention recognition model, and generate a loss of a loss function; and performing optimization training on the attention recognition model according to the loss of the loss function until the loss of the loss function is minimum, and taking the model corresponding to the minimum loss of the loss function as the updated attention recognition model.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any plurality of the obtaining module 710, the generating module 720, the outputting module 730, and the determining module 740 may be combined and implemented in one module/unit/sub-unit, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the obtaining module 710, the generating module 720, the outputting module 730, and the determining module 740 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or may be implemented in any one of three implementations of software, hardware, and firmware, or in a suitable combination of any of them. Alternatively, at least one of the obtaining module 710, the generating module 720, the outputting module 730, and the determining module 740 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.

It should be noted that, a portion of the apparatus for evaluating the attention state in the embodiment of the present disclosure corresponds to a portion of the method for evaluating the attention state in the embodiment of the present disclosure, and a description of the portion of the apparatus for evaluating the attention state specifically refers to the portion of the method for evaluating the attention state, and is not described herein again.

FIG. 8 schematically illustrates a block diagram of a computer system suitable for implementing the above-described method, according to an embodiment of the present disclosure. The computer system illustrated in FIG. 8 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.

As shown in fig. 8, a computer system 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the system 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

System 800 may also include an input/output (I/O) interface 805, also connected to bus 804, according to an embodiment of the disclosure. The system 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 802 and/or RAM 803 described above and/or one or more memories other than the ROM 802 and RAM 803.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, for causing the electronic device to carry out the method provided by the embodiments of the present disclosure for assessing a state of interest.

The computer program, when executed by the processor 801, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via communication section 809, and/or installed from removable media 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method for evaluating an attention state applied to an advertisement screen terminal, comprising:

generating a face information feature vector based on the face image information;

inputting the face information feature vector to an attention identification model to obtain an output result of the attention identification model; the concerned identification model is obtained by training according to the sample face information characteristic vector and is used for identifying whether an object concerns the advertising screen terminal; and

determining an attention state evaluation result of the object to be evaluated according to the output result of the attention recognition model; wherein the attention state evaluation result comprises attention and non-attention.

2. The method of claim 1, wherein:

the attention identification model comprises a convolutional neural network;

3. The method of claim 1, wherein determining the evaluation result of the attention state of the object to be evaluated according to the output result of the attention recognition model comprises:

comparing the output result of the attention identification model with a first preset threshold value; wherein, the output result of the attention identification model comprises the attention probability of the object to be evaluated;

if the attention probability is larger than the first preset threshold, determining that the attention state evaluation result of the object to be evaluated is attention; and

4. The method according to claim 1, after determining a result of evaluation of the state of interest of the object to be evaluated from the output result of the attention recognition model, the method further comprising:

determining, based on a focus state evaluation result of each of the objects, that the focus state evaluation result is a second number of objects of focus; and

5. The method of claim 3, wherein the attention recognition model is obtained by training according to a sample face information feature vector by using a loss function, and the training process comprises:

inputting the face information characteristic vector of the sample object into an initial convolutional neural network to generate loss of a loss function; wherein the parameters of the initial convolutional neural network are preset parameters; and

6. The method according to claim 5, after said determining a result of evaluation of the state of interest of the object to be evaluated from the output result of the attention recognition model, the method further comprising:

inputting the face information characteristic vector corresponding to the face image information of the object determined as the attention object into the attention identification model to generate loss of a loss function; and

and performing optimization training on the concerned recognition model according to the loss of the loss function until the loss of the loss function is minimum, and taking the model corresponding to the loss function with the minimum loss as an updated concerned recognition model.

7. The method of claim 6, wherein the determining, according to the output result of the attention recognition model, the attention state evaluation result of the object to be evaluated comprises:

comparing the output result of the attention identification model with a second preset threshold value; wherein the second preset threshold is greater than the first preset threshold;

and if the attention probability is larger than the second preset threshold, determining that the object to be evaluated is the object determined to be the attention.

8. The method of claim 5, the loss function comprising a combination of a classification loss softmax function and a center loss center function.

9. The method of claim 8, wherein the loss function is as follows:

10. The method of claim 1, the generating a face information feature vector based on the face image information comprising:

extracting facial region features of the facial image information to obtain facial region image information; and

11. An apparatus for evaluating a state of interest, comprising:

the determination module is used for determining an attention state evaluation result of the object to be evaluated according to the output result of the attention recognition model; wherein the attention state evaluation result comprises attention and non-attention.

12. The apparatus of claim 11, further comprising:

the training module is used for acquiring the face image information of the sample object with the known attention state result; generating a face information feature vector of the sample object based on the face image information of the sample object; inputting the face information characteristic vector of the sample object into an initial convolutional neural network to generate loss of a loss function; wherein the parameters of the initial convolutional neural network are preset parameters; and training the initial convolutional neural network according to the loss of the loss function until the loss of the loss function is minimum, and taking a model corresponding to the loss function when the loss of the loss function is minimum as the attention recognition model.

13. A computer system, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-10.

14. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 10.

15. A computer program product, comprising:

computer executable instructions for use when executed to implement the method of any one of claims 1 to 10.