CN111126179A

CN111126179A - Information acquisition method and device, storage medium and electronic device

Info

Publication number: CN111126179A
Application number: CN201911239754.1A
Authority: CN
Inventors: 李冠楠
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-05-08

Abstract

The application provides an information acquisition method and device, a storage medium and an electronic device, wherein the method comprises the following steps: extracting key frames from a video to be detected; detecting a first decoration area containing a first decoration object in a frame image of a key frame; extracting clothing characteristics of the first clothing object from the first clothing area, wherein the clothing characteristics of the first clothing object comprise at least one of the following: color information of the first decoration object, and posture information of the first decoration object; under the condition that a second clothing area, which is matched with clothing characteristics of the first clothing object, is determined from the multiple reference clothing areas, target clothing information corresponding to the second clothing object contained in the second clothing area is obtained, wherein each reference clothing area contains at least one clothing object; and carrying out region tracking on the first decoration region in a video frame sequence of the video to be detected, and determining the occurrence information of the first decoration object in the video to be detected.

Description

Information acquisition method and device, storage medium and electronic device

Technical Field

The present application relates to the field of computers, and in particular, to a method and an apparatus for acquiring information, a storage medium, and an electronic apparatus.

Background

At present, when clothes appear in a video program (for example, a variety program), a convenient purchasing entrance of the clothes commodity is provided for a user by adding commodity information of the same type of clothes commodity, and the requirement of the user for purchasing the same type or the same type of clothes is met.

In order to obtain information on apparel goods included in a video program, it is necessary to identify the apparel style appearing in the video. In the clothing identification method in the related art, the clothing style is generally identified by utilizing the clothing feature point information, such as the position information of cuffs, neckline and lap. However, the clothing recognition mode has limited ability of expressing clothing features far away from the feature point area, and is easy to generate an overfitting phenomenon during training, so that the performance of the algorithm in an actual scene is limited, and the clothing recognition accuracy is low.

Disclosure of Invention

The embodiment of the application provides an information acquisition method and device, a storage medium and an electronic device, and aims to at least solve the problem that in a clothing identification mode in the related technology, the clothing identification accuracy is low due to weak clothing expression capacity.

According to an aspect of an embodiment of the present application, there is provided an information obtaining method, including: extracting key frames from a video to be detected; detecting a first decoration area containing a first decoration object in a frame image of a key frame; extracting clothing characteristics of the first clothing object from the first clothing area, wherein the clothing characteristics of the first clothing object comprise at least one of the following: color information of the first decoration object, and posture information of the first decoration object; under the condition that a second clothing area, which is matched with clothing characteristics of the first clothing object, is determined from the multiple reference clothing areas, target clothing information corresponding to the second clothing object contained in the second clothing area is obtained, wherein each reference clothing area contains at least one clothing object; and carrying out region tracking on the first decoration region in a video frame sequence of the video to be detected, and determining the occurrence information of the first decoration object in the video to be detected.

According to another aspect of the embodiments of the present application, there is provided an apparatus for acquiring information, including: the first extraction unit is used for extracting key frames from a video to be detected; a detecting unit, configured to detect a first decoration area containing a first decoration object in a frame image of the key frame; a second extraction unit, configured to extract a clothing feature of the first clothing object from the first clothing region, where the clothing feature of the first clothing object includes at least one of: color information of the first decoration object, and posture information of the first decoration object; a first obtaining unit, configured to, in a case where a second clothing region in which clothing features of clothing objects included are matched with clothing features of a first clothing object is determined from a plurality of reference clothing regions, obtain target clothing information corresponding to the second clothing object included in the second clothing region, where each reference clothing region includes at least one clothing object; the first determining unit is used for performing area tracking on the first decoration area in the video frame sequence of the video to be detected and determining the appearance information of the first decoration object in the video to be detected.

Optionally, the first extraction unit includes: the first extraction module is used for extracting key frames from a video to be detected according to a target interval; or, the second extraction module is configured to extract a key frame corresponding to a shot from the shot included in the video to be detected.

Optionally, the second extraction unit comprises: and the input module is used for inputting the first clothing region into the first feature extraction model to obtain clothing features of the first clothing object, which are output by the first feature extraction model, wherein the first feature extraction model is obtained by training the first initial model by using a first training sample, and the first training sample is an image marked with the first clothing features of the first training clothing object.

Optionally, the apparatus further comprises: a second obtaining unit, configured to obtain a first clothing feature of the first training clothing object before inputting the first clothing region into the first feature extraction model and obtaining the clothing feature of the first clothing object output by the first feature extraction model, where the first clothing feature includes at least one of: first color information of a first training clothing object and first pose information of the first training clothing object; the first training unit is used for training the first initial model by using a first training sample to obtain a first feature extraction model, wherein the similarity between a second clothing feature extracted from the first training sample by the first feature extraction model and the first clothing feature is greater than or equal to a first threshold, and the second clothing feature comprises at least one of the following features: second color information of the first training clothing object and second pose information of the first training clothing object.

Optionally, the second obtaining unit includes at least one of: the acquisition module is used for performing histogram calculation and clustering calculation on the first training sample to acquire first color information; and the first determining module is used for determining first posture information according to the marked first position information of the characteristic points of the first training clothing object and the marked visibility information of the characteristic points.

Optionally, the apparatus further comprises: a third obtaining unit for, after extracting the clothing feature of the first clothing object from the first clothing region, inputting the clothing characteristics of the first clothing object into a second characteristic extraction model, acquiring the target characteristics of the first clothing object output by the second characteristic extraction model, the second feature extraction model is obtained by training a second initial model by using a second training sample, the second training sample is an image which marks the clothing features of a second training clothing object and the same type identification of the second training clothing object, the same type identification of the second training clothing object is used for identifying the same type of the second training clothing object, the similarity between the target features of the same type of the second training clothing object extracted by the second feature extraction model is greater than or equal to a second threshold value, and the similarity between the target features of different types of the second training clothing object extracted by the second feature extraction model is smaller than the second threshold value; a fourth obtaining unit configured to obtain, from the plurality of reference clothing regions, a candidate clothing region whose target feature matches a target feature of the first clothing object; and the second determining unit is used for determining a second clothing area with clothing characteristics matched with the clothing characteristics of the first clothing object from the candidate clothing areas.

Optionally, the apparatus further comprises: the input unit is used for inputting a second training sample into the first feature extraction model before inputting the clothing features of the first clothing object into the second feature extraction model and acquiring the target features of the first clothing object output by the second feature extraction model, so as to obtain the clothing features of the second training clothing object output by the first feature extraction model; and the second training unit is used for training the second initial model by using the clothing features of the second training clothing object and the same-style information of the second training object to obtain a second feature extraction model.

Optionally, the apparatus further comprises: the first determining unit comprises a second determining module, wherein the second determining module is used for respectively performing area detection on a video frame before a key frame and a video frame after the key frame in a video to be detected according to a first decoration area, and determining time period information and position information of a first decoration object appearing in the video to be detected, wherein the appearance information comprises time period information and position information; the adding unit is used for carrying out region tracking on the first decoration region in a video frame sequence of the video to be detected, and adding control information in the video to be detected after determining the occurrence information of the first decoration object in the video to be detected, wherein the control information is used for controlling the target decoration information to be displayed on the position corresponding to the position information in the video to be detected in a pop-up window mode when the video to be detected is played to the time period corresponding to the time period information.

According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to carry out the steps of any of the above-described method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the method, the key frames are extracted from the video to be detected in a mode of clothing identification by combining color information and/or posture information; detecting a first decoration area containing a first decoration object in a frame image of a key frame; extracting clothing characteristics of the first clothing object from the first clothing area, wherein the clothing characteristics of the first clothing object comprise at least one of the following: color information of the first decoration object, and posture information of the first decoration object; under the condition that a second clothing area, which is matched with clothing characteristics of the first clothing object, is determined from the multiple reference clothing areas, target clothing information corresponding to the second clothing object contained in the second clothing area is obtained, wherein each reference clothing area contains at least one clothing object; the method comprises the steps of carrying out region tracking on a first clothing region in a video frame sequence of a video to be detected, determining appearance information of a first clothing object in the video to be detected, and carrying out clothing identification by combining color information and/or posture information of the clothing object, so that the purpose of increasing the expression capacity of clothing characteristics on the clothing object can be achieved, the technical effect of improving the clothing identification accuracy is achieved, and the problem of low clothing identification accuracy caused by weak clothing expression capacity in a clothing identification mode in the related technology is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a block diagram of an alternative server hardware configuration according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an alternative information obtaining method according to an embodiment of the present application;

fig. 3 is a schematic flow chart diagram illustrating another alternative information acquisition method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating an alternative model training process according to an embodiment of the present application;

FIG. 5 is a flow diagram illustrating the construction of an alternative apparel feature database in accordance with an embodiment of the present application;

FIG. 6 is a flow chart illustrating an alternative identification of apparel in video according to an embodiment of the present application; and the number of the first and second groups,

fig. 7 is a block diagram of an alternative information acquisition apparatus according to an embodiment of the present application.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

According to one aspect of the embodiment of the application, an information acquisition method is provided. Alternatively, the method may be executed in a server (server of a video content playing platform), a user terminal, or a similar computing device. Taking an example of an application running on a server, fig. 1 is a block diagram of a hardware structure of an optional server according to an embodiment of the present application. As shown in fig. 1, the server 10 may include one or more (only one shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and is not intended to limit the structure of the server. For example, the server 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the information obtaining method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to server 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 10. In one example, the transmission device 106 includes a NIC (Network Interface Controller) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be an RF (Radio Frequency) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, an information obtaining method operating on the server is provided, and fig. 2 is a flowchart of an optional information obtaining method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S202, extracting key frames from a video to be detected;

step S204, detecting a first decoration area containing a first decoration object in the frame image of the key frame;

step S206, the clothing characteristics of the first clothing object are extracted from the first clothing area, wherein the clothing characteristics of the first clothing object comprise at least one of the following items: color information of the first decoration object, and posture information of the first decoration object;

step S208, under the condition that a second clothing area, which is matched with the clothing characteristics of the first clothing object, is determined from the multiple reference clothing areas, target clothing information corresponding to the second clothing object contained in the second clothing area is obtained, wherein each reference clothing area contains at least one clothing object;

step S210, performing region tracking on the first decoration region in the video frame sequence of the video to be detected, and determining occurrence information of the first decoration object in the video to be detected.

Alternatively, the execution subject of the above steps may be a server, a user terminal, etc., but is not limited thereto.

Through this embodiment, adopt and combine color information and/or attitude information to carry out the mode of dress discernment, because the color information and/or the attitude information that combines dress object carry out dress discernment, can realize increasing dress characteristic to dress object's expression ability's purpose, solved dress identification mode among the correlation technique and had the problem that dress discernment accuracy is low because dress expression ability weak leads to, improved dress discernment accuracy.

The following describes a method for acquiring information in the embodiment of the present application with reference to fig. 2.

In step S202, key frames are extracted from the video to be detected.

For a program video of a video program (e.g., a variety program), a user may view the program video through a means such as a client, a web page, etc.

In order to display clothing information (for example, clothing commodity information) corresponding to the clothing object in the program video, clothing information corresponding to the clothing object in the program video may be added in advance between the playing of the program video, or, in the process of playing the program video, clothing information corresponding to the clothing object included in the program video may be determined and acquired in real time.

When acquiring clothing information, firstly, extracting a key frame of a video to be detected (for example, a program video) so as to process a frame image of the extracted key frame. The manner of extracting the key frame of the video to be detected can be various, and may include but is not limited to one of the following: and extracting at equal intervals and extracting according to the shots.

As an alternative embodiment, the extracting key frames from the video to be detected includes: extracting key frames from a video to be detected according to a target interval; or extracting key frames corresponding to the shots from the shots contained in the video to be detected.

As an alternative implementation, the key frames may be extracted from the video to be detected according to the target interval. The target interval may be a target time interval, for example, one video frame may be extracted every 2s (which may be set or modified as needed) as the current key frame, or a predetermined number of video frame intervals, for example, one video frame may be extracted every 50 video frames (which may be set or modified as needed) as the current key frame. The specific manner of extracting the key frames at equal intervals may be set as needed, which is not specifically limited in this embodiment.

As another alternative, key frames corresponding to shots may be extracted from shots included in the video to be detected. The video to be detected may include a plurality of shots, wherein, in video frames within the same shot, the similarity of adjacent video frames may be greater than or equal to a first threshold. For each shot, one or more video frames may be extracted as key frames. The specific manner for determining the shots included in the video to be detected and the manner for extracting the key frames from each shot may be set as required (e.g., according to the similarity of adjacent video frames), which is not specifically limited in this embodiment.

Through this embodiment, through the key frame of the video of waiting to detect of extraction according to specific interval, perhaps, according to the key frame of the video of waiting to detect of shot extraction, can guarantee the rationality of key frame extraction, improve the extraction efficiency of key frame.

In step S204, a first decoration area including a first decoration object in the frame image of the key frame is detected.

For the extracted key frames, the frame images of the key frames can be sequentially processed, and a first decoration area containing a first decoration object in the frame images of the key frames is determined. The first service area may be a quadrilateral area including the first service object. The manner of detecting the first clothing region may be combined with related technologies, for example, the manner of detecting the clothing region based on the washion detect algorithm may be used, and is not specifically limited herein.

For example, the keyframe calculation may be performed on the video to be processed, and the clothing detection may be performed on each keyframe, so as to obtain clothing regions of each keyframe.

In step S206, a clothing feature of the first clothing object is extracted from the first clothing region, wherein the clothing feature of the first clothing object includes at least one of: color information of the first decoration object, and posture information of the first decoration object.

For the detected first clothing region, clothing features of the first clothing object may be extracted from the first clothing region. Apparel features are used to represent apparel objects, and may include one or more features, and apparel features may be the same or different for different apparel objects.

The apparel characteristics may include at least one of: the color information, the posture information, the color information of the clothing object may be the color information of the whole clothing object for representing the whole color (dominant hue) of the clothing object, and the posture information of the clothing object may be used for representing the current posture of the clothing object. In addition to color information and/or pose information, apparel features may include at least one of: clothing category information, style information, and feature point information, wherein,

(1) the clothing category information is used for categories of clothing objects, for example, short sleeves T, long sleeves blouses, and the like;

(2) the style information is used for representing the style of the clothing object, such as the thinner characteristics of texture, material, shape and the like;

(3) the feature point information is used to represent position information of feature points of the clothing object, and the feature points may include: the cuff, the collar, the lower hem and the waist are respectively provided with two characteristic points.

With respect to the pose information, the pose information of the apparel object may be used to represent the current pose of the apparel object, e.g., the front (front of the apparel object faces the lens), the back (back of the apparel object faces the lens), the side (side of the apparel object faces the lens), etc. The pose information may also indicate the angle at which the apparel object is turned relative to the upright if the current pose of the apparel object is a side. For example, the pose information of the apparel object may represent: the apparel object is turned sideways by 30.

As an alternative embodiment, the extracting of the clothing feature of the first clothing object from the first clothing region includes: and inputting the first clothing region into a first feature extraction model to obtain clothing features of the first clothing object output by the first feature extraction model, wherein the first feature extraction model is obtained by training a first initial model by using a first training sample, and the first training sample is an image marked with the first clothing features of the first training clothing object.

For the first clothing region, clothing features of the first clothing object may be extracted using a first feature extraction model. The first clothing region may be input to the first feature extraction model, and an output result of the first feature extraction model may be obtained, and the output result may be used as a clothing feature of the first clothing object included in the first clothing region.

The first feature extraction model may be obtained by training a first initial model (first initial feature extraction model) using a first training sample, which is an image labeled with a first decorative feature of a first training clothing object included therein.

Through this embodiment, carry out the extraction of the dress feature of first dress object through using the first feature extraction model that trains, can improve the efficiency that dress feature extracted, improve the accuracy that dress feature extracted.

Before using the first feature extraction model, a first initial model may be trained using a first training sample to obtain the first feature extraction model.

As an alternative embodiment, before the first clothing region is input to the first feature extraction model, and the clothing feature of the first clothing object output by the first feature extraction model is obtained, the first clothing feature of the first training clothing object may be obtained, where the first clothing feature includes at least one of: first color information of a first training clothing object and first pose information of the first training clothing object; training the first initial model by using a first training sample to obtain a first feature extraction model, wherein the similarity between a second clothing feature extracted from the first training sample by the first feature extraction model and the first clothing feature is greater than or equal to a first threshold, and the second clothing feature comprises at least one of the following features: second color information of the first training clothing object and second pose information of the first training clothing object.

For a first training sample, a first apparel feature of a first training apparel object contained in the first training sample may be obtained, and the first apparel feature may be at least one of: first color information of a first training apparel object; first pose information of a first training apparel object. The first apparel characteristic may also include at least one of: the first training clothing object comprises first clothing category information of the first training clothing object, first style information of the first training clothing object and first position information of feature points of the first training clothing object.

After the first clothing feature of the first training clothing object is obtained, the first initial model is trained by using the first training sample, and a first feature extraction model is obtained. When model training is carried out, all the first training samples can be sequentially input into the first initial model in an iterative mode to obtain a detection result (clothing feature of a first training clothing object) output by the first initial model, model parameters of the first initial model are adjusted according to the similarity degree of the clothing feature of the first training clothing object and the first clothing feature output by the first initial model to adjust the similarity degree of the clothing feature of the first training clothing object and the first clothing feature output by the first initial model, and training completion is determined when a convergence condition (meeting a target function of model training) is met, so that a first feature extraction model is obtained.

The similarity between the first clothing feature and the second clothing feature extracted from the first training sample by the first feature extraction model is greater than or equal to a first threshold, and the second clothing feature may include: second color information of the first training clothing object and second pose information of the first training clothing object. For example, the first threshold may be a similarity between the clothing feature of the first training clothing object and the first clothing feature, extracted from the first training sample by the adjusted first initial model, when the requirement of the objective function is satisfied.

The server using the first feature extraction model and the server training the first initial model may be the same server or different servers.

Through this embodiment, through using first training sample to train first initial model, obtain first feature extraction model, can improve the detectability of the clothing characteristic of the first feature extraction model to the clothing object that the training obtained.

The first clothing feature of the first training clothing object can be obtained in various manners, and the first clothing feature can be obtained in a manual labeling manner or in a machine labeling manner.

As an alternative embodiment, obtaining the first apparel characteristic of the first training apparel object comprises at least one of: performing histogram calculation and clustering calculation on the first training sample to obtain first color information; and determining first posture information according to the marked first position information of the characteristic points of the first training clothing object and the marked visibility information of the characteristic points.

For the color information of the first training clothing object, histogram calculation may be performed on the first training sample to obtain a histogram of the first training sample. And then clustering the pixel points of the first training sample according to the histogram, and taking the color information of the target class which contains the most pixel points in the multiple classes obtained after clustering as the color information of the first training clothing object.

And for the attitude information of the first training clothing object, determining the first attitude information of the first training clothing object according to the marked first position information of the feature points of the first training clothing object and the marked visibility information of the feature points. The visibility information of the feature point is used to indicate whether the feature point is visible.

According to the marked first position information of the feature points of the first training clothing object and the marked visibility information of the feature points, the image space attention mask of the target position area (for example, the area with possibly rich texture of the chest, the front leg and the like) of the first training clothing object can be weighted to obtain a processed first training sample, and the processed first training sample is used for training the first initial model.

For example, the first feature extraction model may be a clothing image feature expression model, and may perform histogram calculation and cluster calculation on a large-scale data set including clothing categories, style sets, and feature point information, and mine clothing color information (first color information) by using a semi-automatic data collection manner; judging the posture information of the current clothes by combining the position and the visibility information of the characteristic points, and reinforcing an image space attention mask in a region with abundant texture of the clothes, such as the front part of the chest, the front side of the leg and the like; and training a clothing image feature expression model by combining the category, style, color, feature points and posture information of the clothing.

By the embodiment, the color information of the first training clothing object is obtained by clustering the histogram of the training image, so that the capability of representing the color of the clothing object by the first color information can be improved; the attitude information of the clothing object is determined according to the position of the characteristic point and the visibility information of the clothing object, and the accuracy of determining the attitude information of the clothing object can be improved.

After obtaining the first feature extraction model, the first clothing region may be input to the first feature extraction model, an output result of the first feature extraction model is obtained, and the output result is used as a clothing feature of the first clothing object.

After the clothing feature of the first clothing object is obtained, area matching can be directly carried out according to the clothing feature of the first clothing object, the same type distinguishing feature (target feature) of the first clothing object can be obtained according to the clothing feature of the first clothing object, area screening is carried out according to the same type distinguishing feature, and finally the first clothing area is matched with the screened area according to the clothing feature of the first clothing object.

As an alternative embodiment, after the clothing feature of the first clothing object is extracted from the first clothing region, the clothing feature of the first clothing object is input to the second feature extraction model, and the target feature of the first clothing object output by the second feature extraction model is obtained; acquiring candidate clothing areas with target characteristics matched with the target characteristics of the first clothing object from the plurality of reference clothing areas; and determining a second clothing area with clothing characteristics matched with the clothing characteristics of the first clothing object from the candidate clothing areas.

The target feature (the same type discrimination feature) of the first service object may be extracted using the second feature extraction model. The second feature extraction model can be obtained by training the second initial model by using a second training sample, wherein the second training sample is an image marked with the clothing feature of the second training clothing object and the identity of the second training clothing object. The homogeneous identification of the second training object is used to identify a second training clothing object of the same type (identifying whether the two clothing objects are homogeneous clothing). The similarity between the target features of the same type of second training clothing objects extracted by the second feature extraction model is larger than or equal to a second threshold, and the similarity between the target features of different types of second training clothing objects extracted by the second feature extraction model is smaller than the second threshold. The second threshold may be set based on empirical values and may be adjusted as desired.

After the target feature of the first clothing object is obtained, a candidate clothing area with the target feature matched with the target feature of the first clothing object is obtained from the multiple reference clothing areas. For example, a similarity between the target feature of the first clothing object and the target feature of the reference clothing object included in each reference clothing region may be calculated, and a reference clothing region in which the similarity between the target feature of the included reference clothing object and the target feature of the first clothing object is greater than or equal to a third threshold may be regarded as the candidate clothing region.

After the candidate apparel regions are obtained, a second apparel region may be determined from the candidate apparel regions where apparel characteristics match apparel characteristics of the first apparel object. For example, a similarity between a clothing feature of a clothing object contained in the candidate clothing region and a clothing feature of the first clothing object may be calculated, and a candidate clothing region in which the similarity between the clothing feature of the contained reference clothing object and the clothing feature of the first clothing object is greater than or equal to a fourth threshold may be regarded as the second clothing region.

For example, the image feature calculation may be performed on the detected clothing region using a clothing image feature expression model (first feature extraction model) obtained by training to obtain an image expression feature (clothing feature of the first clothing object) of the clothing region, and the identical type distinguishing feature (target feature) of the clothing region may be calculated using the identical type clothing distinguishing model (second feature extraction model) obtained by training in combination with the generated image expression feature.

When the region retrieval is carried out, the same-style distinguishing features obtained by calculation can be used for carrying out retrieval query in a clothing feature database (comprising a plurality of reference clothing regions), and images meeting the similarity requirement in the database are used as same-style recommendation candidates (candidate clothing regions); and confirming the same type of recommended candidates by using the image expression characteristics obtained by calculation, identifying whether the clothing region is consistent with the candidate type of clothing in the database, and if so, taking the consistent candidate clothing region as the same type of clothing region (a second clothing region) in the video.

For a plurality of reference apparel regions, apparel features and target features of the reference apparel objects for each reference apparel region may be predetermined and saved to an apparel feature database.

For example, a clothing feature database may be pre-constructed. The method comprises the steps of detecting clothing regions of clothing images of styles to be popularized, extracting image features (clothing features) of the clothing region images by using a generated clothing image feature expression model, and extracting the same-style distinguishing features of clothing by using a generated same-style clothing distinguishing model and combining the generated image features; and (5) taking the image features and the identity distinguishing features as clothing feature vectors to construct a clothing feature database.

Through this embodiment, through at first according to the regional screening of reference dress of the dress characteristic of the same style, then carry out the regional matching of dress according to the dress characteristic of dress object, can improve the efficiency of dress regional matching, reduce the occupation of dress regional matching to processing resources.

The second initial model may be trained using a second training sample, resulting in a second feature recognition model.

As an optional embodiment, before inputting the clothing feature of the first clothing object into the second feature extraction model and obtaining the target feature of the first clothing object output by the second feature extraction model, inputting the second training sample into the first feature extraction model to obtain the clothing feature of the second training clothing object output by the first feature extraction model; and training the second initial model by using the clothing features of the second training clothing object and the same-style information of the second training object to obtain a second feature extraction model.

For the second training sample, the clothing features of a second training clothing object included in the second training sample may be extracted using the first feature extraction model. And then, training the second initial model by using the clothing features of the second training clothing object and the same-style information of the second training object to obtain a second feature extraction model.

When the second feature extraction model training is performed, the clothing features of the second training clothing object of each second training sample can be input into the second initial model in an iterative mode, and the target features of the second training clothing object output by the second initial model are obtained. According to the identity information, the target features of the second training clothing objects contained in the current second training sample and the target features of the second training clothing objects contained in the second training sample before the current second training sample are adjusted, so that the similarity of the target features of the second training clothing objects in the same type is larger than or equal to a second threshold, and the similarity of the target features of the second training clothing objects in different types is smaller than the second threshold.

It should be noted that the target feature may be a same type distinguishing feature, and may be considered as a joint distinguishing feature of a plurality of clothing features.

For example, a small-scale same-style clothing data set can be combined with the generated clothing image feature expression model (first feature extraction model), a similarity judgment model of the same-style clothing can be trained, and the obtained similarity judgment model and the same-style clothing judgment model (second feature extraction model) can be obtained.

Through the embodiment, the second initial model is trained by combining the clothing features extracted by the first feature extraction model, so that the extraction efficiency of the second initial model can be improved, and the model training process is simplified (the clothing features of the second training clothing object in the second training sample do not need to be additionally labeled).

In step S208, in a case where a second clothing region in which clothing features of the contained clothing object match clothing features of the first clothing object is determined from a plurality of reference clothing regions, target clothing information corresponding to the second clothing object contained in the second clothing region is acquired, wherein each reference clothing region contains at least one clothing object.

After obtaining a second clothing region matching the first clothing region, target clothing information corresponding to a second clothing object contained in the second clothing region may be obtained, which may include, but is not limited to: description information, link information, or other information associated with the second apparel object.

Obtaining target apparel information corresponding to a second apparel object contained in a second apparel region may be: and extracting the target clothing information of the second clothing object from the database storing the clothing object information of each clothing object according to the article identification corresponding to the second clothing object.

In step S210, the first service area is subjected to area tracking in the video frame sequence of the video to be detected, and occurrence information of the first service object in the video to be detected is determined.

After obtaining target clothing information corresponding to a second clothing object contained in a second clothing region, performing region tracking on the first clothing region in a video frame sequence of a video to be detected, and determining occurrence information of the first clothing object in the video to be detected, where the occurrence information is used to indicate information that the first clothing object appears in the video to be detected, and the information may include, but is not limited to, at least one of: time period information (time point location information), location information.

The time period (target time period) in which the first service object appears in the video to be detected may be one time period or may be a plurality of time periods. In each video frame within the target time period, the position information of the first service object may be coordinate information of the first service object in the video frame (for example, coordinate information represented by x, y coordinates), or may be area information of the first service object in the video frame (for example, a left half area, a right half area, and for example, a middle area, an upper left area, a lower left area, an upper right area, and a lower right area).

The position information of the first decoration object appearing in the video to be detected may be position information of a specific point (e.g., a center point) of the first decoration area appearing in the video to be detected. When the location information is saved, the location information of the first service object in each video frame in the target time period may be saved, or only the change of the location information of the first service object in the video frame in the target time period may be saved.

For example, the first decoration object appears in the video frames from the 5 th s to the 10 th s of the video to be detected, wherein in the video frames from the 5 th to the 7 th s, the position coordinate where the first decoration object appears is (x)₁，y₁) In the 7 th to 9 th video frames, the position coordinate where the first decoration object appears is (x)₂，y₂) In the 9 th to 10 th video frames, the position coordinate where the first decoration object appears is (x)₃，y₃). The time period of the first service object in the video to be detected is as follows: and 5-10s, the position of the first service object in the video to be detected is as follows: 5 th to 7 th, (x)₁，y₁) (ii) a 7 th to 9 th, (x)₂，y₂) (alternatively), (x)₂-x₁，y₂-y₁) ); 9 th to 10 th, (x)₃，y₃) (alternatively), (x)₃-x₁，y₃-y₁) Or, (x)₃-x₂，y₃-y₂))。

As an alternative embodiment, performing region tracking on the first service region in the video frame sequence of the video to be detected, and determining the occurrence information of the first service object in the video to be detected includes: according to the first decoration area, area detection is respectively carried out on a video frame before the key frame and a video frame after the key frame in the video to be detected, and time period information and position information of a first decoration object appearing in the video to be detected are determined, wherein the appearance information comprises the time period information and the position information.

After target clothing information corresponding to the second clothing object contained in the second clothing region is obtained, time point location information of the first clothing object appearing in the video to be detected can be further determined. The manner of determining the time point location information may be: and performing bidirectional tracking on the first decoration area in a video frame sequence of the video to be detected, and determining time period information of the first decoration object appearing in the video to be detected.

The bidirectional tracking method may be multiple, for example, the method may perform area detection on a video frame before the key frame and a video frame after the key frame in the video to be detected, determine a video frame in which the first decoration area appears in the video frame sequence of the video to be detected, obtain time point location information of the first decoration area, and further determine time period information of the first decoration area. For another example, the region detection may be performed on a key frame before the current key frame and a key frame after the current key frame in the video to be detected, the key frame in which the first decoration region appears in the video frame sequence of the video to be detected is determined, the point location information of the time when the first decoration region appears is obtained, and the time period information of the first decoration region appears is further determined.

In addition to the time point location information, the location information of the first service area appearing in the video frame and the candidate area can be determined, and the location information of the first service object appearing in the video to be detected is further determined.

The method comprises the steps of carrying out region tracking on a first decoration region in a video frame sequence of a video to be detected, and adding control information in the video to be detected after determining occurrence information of a first decoration object in the video to be detected, wherein the control information is used for controlling target decoration information to be displayed on a position corresponding to position information in the video to be detected in a pop-up window mode when the video to be detected is played to a time period corresponding to the time period information.

After the time point location information and the position information of the first clothing object appearing in the video to be detected are acquired, control information can be added into the video to be detected, wherein the control information is used for controlling the target clothing information to be displayed on the position corresponding to the position information in the video to be detected in a pop-up window mode (or other modes) when the video to be detected is played to the time period corresponding to the time period information.

Through this embodiment, through carrying out region tracking to first ornament region in waiting to detect the video, can determine time quantum information and positional information that first ornament object appears in waiting to detect the video to conveniently carry out dress information's interpolation, improve dress information interpolation's accuracy.

The following describes the above-described information acquisition method with reference to an alternative example. The method may operate in a video server. The information acquisition method in the example combines color information and human body posture constraint, and adopts a staged model tuning strategy to improve the identification accuracy of the same-style clothes.

As shown in fig. 3, the information acquisition method in the present example may include the steps of:

and step S302, training a clothing image feature expression model.

And training an expression model for identifying the clothing image characteristics, namely a clothing image characteristic expression model (shown in figure 4), by combining the type, style, color, characteristic points and posture information of the clothing.

And step S304, training a same-style clothing distinguishing model.

And (3) training similarity discrimination models, namely a discrimination model and a same-style clothes discrimination model (as shown in figure 4) of the same-style clothes by combining a small-scale same-style clothes data set.

And S306, constructing a clothing feature database.

Clothing region detection is performed on each image in the clothing database, image features (clothing features) of the images in the clothing region are extracted by using the expression model, and then homomorphic distinguishing features (target features) of the clothing are extracted by using the distinguishing model, so that the clothing feature database is constructed (as shown in fig. 5).

And S308, identifying the same clothing area in the video.

Performing key frame calculation on a video to be processed, performing clothing detection on each key frame, performing image feature and identity distinguishing feature calculation on a detected clothing region, performing retrieval query in a clothing feature database by using the identity distinguishing feature, taking images in the database meeting the similarity requirement as identity recommendation candidates, performing identity confirmation by using image expression features, identifying whether the clothing region is consistent with candidate decorations in the database, and if so, taking the clothing region as the same-style clothing region in the video (as shown in fig. 6).

And step S310, generating frame-level precision time point location information for the same dress appearing in the video.

And performing bidirectional tracking on the identified clothing region in a video sequence, obtaining the time point information of the clothing appearing in the video, and prompting and popularizing the clothing in a video popup window mode in the time period.

By the method, the image characteristics are extracted by combining the clothing type, style, color, characteristic points and posture information, so that the expression capability of clothing characteristics can be improved; a staged model training mode is adopted, so that the model tuning performance on a small data set is improved; and the identification method combining multi-level features is adopted, so that the same-style identification accuracy is improved.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

According to another aspect of the embodiments of the present application, there is provided an information acquisition apparatus for implementing the above-described information acquisition method. Optionally, the apparatus is used to implement the above embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 7 is a block diagram of an alternative information acquisition apparatus according to an embodiment of the present application, and as shown in fig. 7, the apparatus includes:

(1) a first extraction unit 702, configured to extract a key frame from a video to be detected;

(2) a detecting unit 704, connected to the first extracting unit 702, for detecting a first decoration area containing a first decoration object in the frame image of the key frame;

(3) a second extracting unit 706, connected to the detecting unit 704, configured to extract a clothing feature of the first clothing object from the first clothing region, where the clothing feature of the first clothing object includes at least one of: color information of the first decoration object, and posture information of the first decoration object;

(4) a first obtaining unit 708, connected to the second extracting unit 706, configured to, in a case that a second clothing region is determined from the multiple reference clothing regions, where clothing features of contained clothing objects match clothing features of the first clothing object, obtain target clothing information corresponding to the second clothing object contained in the second clothing region, where each reference clothing region contains at least one clothing object;

(5) the first determining unit 710 is connected to the first obtaining unit 708, and configured to perform region tracking on the first service region in the video frame sequence of the video to be detected, and determine occurrence information of the first service object in the video to be detected.

Alternatively, the first extracting unit 702 may be used in step S202 in the foregoing embodiment, the detecting unit 704 may be used in step S204 in the foregoing embodiment, the second extracting unit 706 may be used in step S206 in the foregoing embodiment, the first obtaining unit 708 may be used to perform step S208 in the foregoing embodiment, and the first determining unit 710 may be used to perform step S210 in the foregoing embodiment.

As an alternative embodiment, the first extraction unit 702 includes:

(1) the first extraction module is used for extracting key frames from a video to be detected according to a target interval; alternatively, the first and second electrodes may be,

(2) and the second extraction module is used for extracting key frames corresponding to the shots from the shots contained in the video to be detected.

As an alternative embodiment, the second extraction unit 706 includes:

(1) and the input module is used for inputting the first clothing region into the first feature extraction model to obtain clothing features of the first clothing object, which are output by the first feature extraction model, wherein the first feature extraction model is obtained by training the first initial model by using a first training sample, and the first training sample is an image marked with the first clothing features of the first training clothing object.

As an alternative embodiment, the apparatus further comprises:

(1) a second obtaining unit, configured to obtain a first clothing feature of the first training clothing object before inputting the first clothing region into the first feature extraction model and obtaining the clothing feature of the first clothing object output by the first feature extraction model, where the first clothing feature includes at least one of: first color information of a first training clothing object and first pose information of the first training clothing object;

(2) the first training unit is used for training the first initial model by using a first training sample to obtain a first feature extraction model, wherein the similarity between a second clothing feature extracted from the first training sample by the first feature extraction model and the first clothing feature is greater than or equal to a first threshold, and the second clothing feature comprises at least one of the following features: second color information of the first training clothing object and second pose information of the first training clothing object.

As an alternative embodiment, the second obtaining unit includes at least one of:

(1) the acquisition module is used for performing histogram calculation and clustering calculation on the first training sample to acquire first color information;

(2) and the first determining module is used for determining first posture information according to the marked first position information of the characteristic points of the first training clothing object and the marked visibility information of the characteristic points.

As an alternative embodiment, the apparatus further comprises:

(1) a third obtaining unit for, after extracting the clothing feature of the first clothing object from the first clothing region, inputting the clothing characteristics of the first clothing object into a second characteristic extraction model, acquiring the target characteristics of the first clothing object output by the second characteristic extraction model, the second feature extraction model is obtained by training a second initial model by using a second training sample, the second training sample is an image which marks the clothing features of a second training clothing object and the same type identification of the second training clothing object, the same type identification of the second training clothing object is used for identifying the same type of the second training clothing object, the similarity between the target features of the same type of the second training clothing object extracted by the second feature extraction model is greater than or equal to a second threshold value, and the similarity between the target features of different types of the second training clothing object extracted by the second feature extraction model is smaller than the second threshold value;

(2) a fourth obtaining unit configured to obtain, from the plurality of reference clothing regions, a candidate clothing region whose target feature matches a target feature of the first clothing object;

(3) and the second determining unit is used for determining a second clothing area with clothing characteristics matched with the clothing characteristics of the first clothing object from the candidate clothing areas.

As an alternative embodiment, the apparatus further comprises:

(1) the input unit is used for inputting a second training sample into the first feature extraction model before inputting the clothing features of the first clothing object into the second feature extraction model and acquiring the target features of the first clothing object output by the second feature extraction model, so as to obtain the clothing features of the second training clothing object output by the first feature extraction model;

(2) and the second training unit is used for training the second initial model by using the clothing features of the second training clothing object and the same-style information of the second training object to obtain a second feature extraction model.

As an alternative embodiment, the apparatus further comprises: an adding unit, the first determining unit comprising a second determining module, wherein,

(1) the second determining module is used for respectively performing area detection on a video frame before the key frame and a video frame after the key frame in the video to be detected according to the first decoration area, and determining time period information and position information of a first decoration object appearing in the video to be detected, wherein the appearance information comprises the time period information and the position information;

(2) the adding unit is used for carrying out region tracking on the first decoration region in a video frame sequence of the video to be detected, and adding control information in the video to be detected after determining the occurrence information of the first decoration object in the video to be detected, wherein the control information is used for controlling the target decoration information to be displayed on the position corresponding to the position information in the video to be detected in a pop-up window mode when the video to be detected is played to the time period corresponding to the time period information.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

According to yet another aspect of embodiments herein, there is provided a computer-readable storage medium. Optionally, the storage medium has a computer program stored therein, where the computer program is configured to execute the steps in any one of the methods provided in the embodiments of the present application when the computer program is executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, extracting key frames from the video to be detected;

s2, detecting a first decoration area containing a first decoration object in the frame image of the key frame;

s3, the clothing characteristics of the first clothing object are extracted from the first clothing area, wherein the clothing characteristics of the first clothing object comprise at least one of the following items: color information of the first decoration object, and posture information of the first decoration object;

s4, under the condition that a second clothing area, in which clothing characteristics of clothing objects contained in the second clothing area are matched with clothing characteristics of the first clothing object, is determined from a plurality of reference clothing areas, target clothing information corresponding to the second clothing object contained in the second clothing area is obtained, wherein each reference clothing area contains at least one clothing object;

s5, performing area tracking on the first decoration area in the video frame sequence of the video to be detected, and determining the appearance information of the first decoration object in the video to be detected.

Optionally, in this embodiment, the storage medium may include, but is not limited to: a variety of media that can store computer programs, such as a usb disk, a ROM (Read-only Memory), a RAM (Random Access Memory), a removable hard disk, a magnetic disk, or an optical disk.

According to still another aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor (which may be the processor 102 in fig. 1) and a memory (which may be the memory 104 in fig. 1) having a computer program stored therein, the processor being configured to execute the computer program to perform the steps of any of the above methods provided in embodiments of the present application.

Optionally, the electronic apparatus may further include a transmission device (the transmission device may be the transmission device 106 in fig. 1) and an input/output device (the input/output device may be the input/output device 108 in fig. 1), wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, extracting key frames from the video to be detected;

Optionally, for an optional example in this embodiment, reference may be made to the examples described in the above embodiment and optional implementation, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An information acquisition method, comprising:

extracting key frames from a video to be detected;

detecting a first decoration area containing a first decoration object in a frame image of the key frame;

extracting a clothing feature of the first clothing object from the first clothing region, wherein the clothing feature of the first clothing object comprises at least one of: color information of the first garment object, and posture information of the first garment object;

under the condition that a second clothing area, which is matched with clothing characteristics of the first clothing object, is determined from a plurality of reference clothing areas, target clothing information corresponding to the second clothing object contained in the second clothing area is obtained, wherein each reference clothing area contains at least one clothing object;

and carrying out region tracking on the first decoration region in the video frame sequence of the video to be detected, and determining the appearance information of the first decoration object in the video to be detected.

2. The method of claim 1, wherein extracting the key frames from the video to be detected comprises:

extracting the key frames from the video to be detected according to the target interval; alternatively, the first and second electrodes may be,

and extracting the key frames corresponding to the shots from the shots contained in the video to be detected.

3. The method of claim 1, wherein extracting the clothing feature of the first clothing object from the first clothing region comprises:

inputting the first clothing region into a first feature extraction model to obtain clothing features of the first clothing object output by the first feature extraction model, wherein the first feature extraction model is obtained by training a first initial model by using a first training sample, and the first training sample is an image marked with the first clothing features of the first training clothing object.

4. The method of claim 3, wherein prior to inputting the first clothing region to the first feature extraction model, resulting in clothing features of the first clothing object output by the first feature extraction model, the method further comprises:

obtaining the first apparel feature of the first training apparel object, wherein the first apparel feature comprises at least one of: first color information of the first training clothing object and first pose information of the first training clothing object;

training the first initial model by using the first training sample to obtain the first feature extraction model, wherein the similarity between a second clothing feature extracted from the first training sample by the first feature extraction model and the first clothing feature is greater than or equal to a first threshold, and the second clothing feature comprises at least one of the following: second color information of the first training clothing object and second pose information of the first training clothing object.

5. The method of claim 4, wherein obtaining the first garment characteristic of the first training garment object comprises at least one of:

performing histogram calculation and clustering calculation on the first training sample to obtain the first color information;

and determining the first posture information according to the marked first position information of the characteristic points of the first training clothing object and the marked visibility information of the characteristic points.

6. The method of claim 3, wherein after extracting the apparel feature of the first apparel object from the first apparel area, the method further comprises:

inputting the clothing characteristics of the first clothing object into a second characteristic extraction model, acquiring the target characteristics of the first clothing object output by the second characteristic extraction model, wherein the second feature extraction model is obtained by training a second initial model by using a second training sample, the second training sample is an image marked with a clothing feature of a second training clothing object and a homogenous identification of the second training clothing object, the homogeneous identification of the second training object is used to identify the second training apparel object in homogeneous, the similarity between the target features of the second training clothes object extracted by the second feature extraction model and in the same style is larger than or equal to a second threshold value, the similarity between the target features of the second training clothes object extracted by the second feature extraction model and in different styles is smaller than the second threshold value;

obtaining candidate clothing regions with target characteristics matched with the target characteristics of the first clothing object from the plurality of reference clothing regions;

determining the second clothing region with clothing characteristics matched with the clothing characteristics of the first clothing object from the candidate clothing regions.

7. The method of claim 6, wherein before inputting the clothing feature of the first clothing object into the second feature extraction model and obtaining the target feature of the first clothing object output by the second feature extraction model, the method further comprises:

inputting the second training sample into the first feature extraction model to obtain clothing features of the second training clothing object output by the first feature extraction model;

and training the second initial model by using the clothing features of the second training clothing object and the same-style information of the second training object to obtain the second feature extraction model.

8. The method according to any one of claims 1 to 7,

performing region tracking on the first service region in a video frame sequence of the video to be detected, and determining the occurrence information of the first service object in the video to be detected includes: respectively performing area detection on a video frame before the key frame and a video frame after the key frame in the video to be detected according to the first service area, and determining time period information and position information of the first service object appearing in the video to be detected, wherein the appearance information comprises the time period information and the position information;

after performing region tracking on the first service region in the video frame sequence of the video to be detected and determining the occurrence information of the first service object in the video to be detected, the method further includes: adding control information into the video to be detected, wherein the control information is used for controlling the target clothing information to be displayed on the position corresponding to the position information in the video to be detected in a pop-up window mode when the video to be detected is played to the time period corresponding to the time period information.

9. An apparatus for acquiring information, comprising:

the first extraction unit is used for extracting key frames from a video to be detected;

a detecting unit, configured to detect a first decoration area containing a first decoration object in a frame image of the key frame;

a second extraction unit, configured to extract a clothing feature of the first clothing object from the first clothing region, where the clothing feature of the first clothing object includes at least one of: color information of the first garment object, and posture information of the first garment object;

a first obtaining unit, configured to, when a second clothing region in which clothing features of clothing objects included in the plurality of reference clothing regions are matched with clothing features of the first clothing object is determined, obtain target clothing information corresponding to the second clothing object included in the second clothing region, where each of the reference clothing regions includes at least one clothing object;

the first determining unit is configured to perform region tracking on the first service region in the video frame sequence of the video to be detected, and determine occurrence information of the first service object in the video to be detected.

10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 8 when executed.

11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 8 by means of the computer program.