WO2022134700A1 - 目标对象识别方法及装置 - Google Patents

目标对象识别方法及装置 Download PDF

Info

Publication number
WO2022134700A1
WO2022134700A1 PCT/CN2021/120387 CN2021120387W WO2022134700A1 WO 2022134700 A1 WO2022134700 A1 WO 2022134700A1 CN 2021120387 W CN2021120387 W CN 2021120387W WO 2022134700 A1 WO2022134700 A1 WO 2022134700A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
initial
picture
target object
initial picture
Prior art date
Application number
PCT/CN2021/120387
Other languages
English (en)
French (fr)
Inventor
徐宝函
李佩易
Original Assignee
上海幻电信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海幻电信息科技有限公司 filed Critical 上海幻电信息科技有限公司
Priority to EP21908701.2A priority Critical patent/EP4206978A4/en
Publication of WO2022134700A1 publication Critical patent/WO2022134700A1/zh
Priority to US18/131,993 priority patent/US20230281990A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/174Segmentation; Edge detection involving the use of two or more images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/147Determination of region of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the field of computer technology, and in particular, to a target object recognition method.
  • the present application also relates to a target object identification device, a computing device, a computer-readable storage medium and a computer program product.
  • an embodiment of the present application provides a target object recognition method.
  • the present application also relates to a target object recognition device, a computing device, and a computer-readable storage medium, to solve the technical defect in the prior art that the identification accuracy of important information in pictures or videos is poor.
  • a target object recognition method including:
  • the target picture corresponding to the target position is input into the recognition model to obtain one or more target objects in the initial picture.
  • a target object identification device including:
  • an initial position determination module configured to input the received initial picture into the first detection model to obtain the initial positions of one or more target objects in the initial picture
  • a verification position determination module configured to input the candidate picture corresponding to the initial position into the second detection model, to obtain the verification object in the candidate picture and the verification position of the verification object in the candidate picture;
  • a target position determination module configured to adjust the initial positions of the one or more target objects based on the verified positions to obtain target positions of the one or more target objects
  • the target object obtaining module is configured to input the target picture corresponding to the target position into the recognition model to obtain one or more target objects in the initial picture.
  • a computing device including a memory, a processor, and computer instructions stored in the memory and executable on the processor, the processor implementing the instructions when the processor executes the instructions The steps of the target object recognition method.
  • a computer-readable storage medium which stores computer instructions, and when the instructions are executed by a processor, implements the steps of the target object identification method.
  • a computer program product which, when the computer program product is executed in a computer, causes the computer to execute the steps of the aforementioned target object identification method.
  • the target object recognition method and device provided by the present application, wherein the target object recognition method includes inputting the received initial picture into a first detection model to obtain the initial positions of one or more target objects in the initial picture;
  • the candidate picture corresponding to the initial position is input into the second detection model, and the verification object in the candidate picture and the verification position of the verification object in the candidate picture are obtained;
  • the initial position of the target object is adjusted to obtain the target position of the one or more target objects;
  • the target picture corresponding to the target position is input into the recognition model to obtain one or more target objects in the initial picture.
  • the target object recognition method adopts the first detection model and the second detection model of the pre-trained lightweight neural network, which can quickly and accurately extract the target objects in pictures or videos in different scenes and formats.
  • the final position positioning of the target object can also be assisted based on the verified position, so that a more accurate recognition result of the target object can be obtained through the recognition model.
  • Fig. 1 is the concrete application structure schematic diagram of a kind of target object identification method that an embodiment of the present application provides;
  • FIG. 2 is a flowchart of a method for identifying a target object provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of an initial picture in a target object recognition method provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a target object recognition method provided in an embodiment of the present application applied to a game competition scene;
  • FIG. 5 is a schematic structural diagram of a target object recognition device provided by an embodiment of the present application.
  • FIG. 6 is a structural block diagram of a computing device provided by an embodiment of the present application.
  • first, second, etc. may be used in one or more embodiments of the present application to describe various information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • the first could also be referred to as the second, and similarly, the second could be referred to as the first without departing from the scope of one or more embodiments of the present application.
  • the word "if” as used herein can be interpreted as "at the time of" or "when” or "in response to determining.”
  • Template matching is a technique of finding the most similar part of an image to another template image, usually through traditional image processing methods such as sliding windows.
  • Object detection Find all the objects of interest in the picture through template matching or neural network.
  • OCR Optical Character Recognition, Optical Character Recognition, refers to the process of analyzing and recognizing image files of text data to obtain text and layout information.
  • SSD Single ShotMultiBox Detector, target detection algorithm.
  • Faster-RCNN A CNN (Convolutional Neural Networks, Convolutional Neural Network) network object detection method, a complete end-to-end CNN object detection model.
  • logo The name of a design, which refers to the act of designing logos for products, businesses, websites, etc. for their own themes or activities.
  • a target object recognition method is provided.
  • the present application also relates to a target object recognition device, a computing device, and a computer-readable storage medium, which are carried out one by one in the following embodiments. Detailed description.
  • FIG. 1 shows a schematic diagram of a specific application structure of a target object recognition method provided according to an embodiment of the present application.
  • the video processing method provided by the embodiment of the present application is applied to a computer, a server, or a cloud service.
  • the application scenario of FIG. 1 includes a CPU (Central Processing Unit, central processing unit)/GPU (Graphics Processing Unit, graphics processing unit) 101, a data storage module 103, a preprocessing module 105, a scoring area detection module 107, and a number recognition module 109 And the information extraction model 111;
  • the CPU/GPU 101 starts to work, obtains the video or pictures to be processed stored in the data storage module 103, and then controls the preprocessing module 105 to extract the key frames in the video to be processed that need to be identified, and according to the score
  • the input of the area detection module 107 requires the preprocessing of the picture or the key frame; then the picture or the key frame is input into the scoring area detection module 107, and the scoring area detection module 107 detects and assists the positioning of the scoring area in the picture or the key frame;
  • the final scoring area in the picture or the key frame is input
  • the target object recognition method provided in this application uses a lightweight neural network model to replace the template matching in the prior art to detect the scoring area in the video or picture, and also proposes a specific logo position to assist the scoring area. Precise positioning, in the face of different scenes and different layouts of pictures or videos, the scoring area can be extracted quickly and accurately; in addition, when recognizing the number of the scoring area, it can also be based on the lightweight neural network model. Numbers for precise identification.
  • FIG. 2 shows a flowchart of a target object recognition method provided according to an embodiment of the present application, which specifically includes the following steps:
  • Step 202 Input the received initial picture into the first detection model, and obtain the initial positions of one or more target objects in the initial picture.
  • the target object recognition method provided in the present application can be applied in game scenarios to recognize game scores; can also be applied in entertainment game scenarios to recognize game scores; and can also be applied to other needs to evaluate scores In the identified application scenario, this application does not make any limitation on this.
  • the target object recognition method is applied to a game scene, and the recognition of the score in the game competition is described in detail.
  • the initial pictures include but are not limited to pictures of any type and content; for example, game pictures, competition pictures, or pictures formed by video frames in a video, etc., and the target object can be understood as a score.
  • the received initial picture is input into a first detection model, and the initial position of one or more scores in the initial picture is obtained through the first detection model;
  • the first detection model includes but is not limited to the SSD model based on MobileNet , among them, MobileNet is a lightweight network suitable for mobile terminals, and SSD, as a one-stage detection network, is faster than two-stage detection networks such as Faster-RCNN.
  • FIG. 3 shows a schematic diagram of an initial picture in a target object recognition method provided according to an embodiment of the present application.
  • FIG. 3 is a game picture in the game scene, and the game picture includes the scores of the game competition, such as individual scores, team scores, and the like.
  • the game picture is input into the MobileNet-based SSD model, and the initial positions of various scores in the game picture can be obtained, such as the initial position 1 of score 1 in Figure 3, the initial position 2 of score 2, and the score. 3 for the initial position 3 and score 4 for the initial position 4 and so on.
  • Step 204 Input the candidate picture corresponding to the initial position into the second detection model, and obtain the verification object in the candidate picture and the verification position of the verification object in the candidate picture.
  • the first detection model and the second detection model may be the same type of detection model and different types of detection models.
  • the first detection model is used to identify the score position. Therefore, during model training, the first detection model is The training sample used by the detection model is the game image, and the corresponding label is the position of the score in the game image; in practical applications, there is usually a logo icon next to each score to indicate the meaning of the score, such as personal Score, team score or kill score, etc., so the second detection model is to identify the logo icon next to the scoring position in the game picture output by the first detection model, then when the model is trained, the second detection model uses The training samples of are the game pictures containing the scores, and the corresponding labels are the logo icons corresponding to each score and the location of the logo icons.
  • the candidate picture corresponding to the initial position of each target object is input into the second detection model, and the second detection model is used to obtain the candidate picture corresponding to the target object in the candidate picture.
  • the verification location of the logo icon and the logo icon corresponding to the target object is input into the second detection model, and the second detection model is used to obtain the candidate picture corresponding to the target object in the candidate picture.
  • Step 206 Adjust the initial positions of the one or more target objects based on the verification positions to obtain target positions of the one or more target objects.
  • the initial position of each corresponding target object is adjusted by using the verification position to obtain the target position of the target object, so that the subsequent identification of the target object can be based on the accurate target position.
  • the target object is accurately identified.
  • the verification position is the position of the logo icon adjacent to the target object, so the target position of each corresponding target object can be adjusted based on the position of the logo icon; specifically, the position of the logo and the scoring area are used.
  • the cutting will be more accurate and fast, and the determination of the accurate scoring position is conducive to subsequent scoring Recognition of digits in positions.
  • Step 208 Input the target picture corresponding to the target position into the recognition model to obtain one or more target objects in the initial picture.
  • the recognition model includes, but is not limited to, a multi-label classification model, wherein the multi-label classification model can use a lightweight network suitable for mobile terminals, such as MobileNet, and the classification label output by the classification model includes the number of digits and each digit The specific category (0 to 9). For example, if the target object in the target position has a score of 21, the target image containing the target object in the target position is input into the recognition model, and the recognition model will output [2, 2, 1], where the first 2 represents the score of Two digits, the second 2 means the first digit of the score is 2, and the second 1 means the second digit of the score is 1.
  • the recognition model it is possible to identify numbers with indeterminate digits, and also identify background categories that do not contain numbers. When identifying background categories that do not contain numbers, the number of digits of the score is 0.
  • an initial picture will include multiple target objects, each target object corresponds to an initial position, and then the candidate picture corresponding to the initial position of each target object is input into the second detection model, and the target object can be obtained.
  • the target image corresponding to each target position is input into the recognition model, the target object in the target image corresponding to each target position can be obtained, and the target objects in the target image corresponding to all the target positions are aggregated together, That is, all target objects in the initial picture can be determined.
  • the target object recognition method uses multiple detections and uses logo to assist positioning, more accurately locates the scoring area, identifies the positioning, improves the precise position of the target object, and realizes pixel-level control of the target object; and uses
  • the lightweight network model can quickly extract and recognize the target objects in the pictures or videos for complex, diverse and different versions of various pictures or videos.
  • the initial picture may be a video frame in the video, then in the case that the initial picture is a video frame in the video, before the input of the received initial picture into the first detection model, the method further includes:
  • the video to be processed is received, and i video frames are extracted from the video to be processed as initial pictures based on preset extraction rules, where i ⁇ [1,n], and i is a positive integer.
  • the preset extraction rules can be set according to actual applications, for example, extracting a video frame every one second, two seconds, or three seconds as an initial picture, or using a video frame scoring model to perform an analysis on each video frame in the video Score, use the high-scoring video frame as the initial picture, etc.
  • the video to be processed is received, and then if i video frames are extracted from the video to be processed as initial pictures based on a preset extraction rule, where i belongs to 1 to n, And i is a positive integer, for example, n is 5, then based on a preset extraction rule, 5 video frames are extracted from the video to be processed as initial pictures.
  • the target object recognition method can be applied to target object recognition of video, and some video frames in the video to be processed are used as initial pictures, so as to realize the target in the video frames of the video to be processed. Objects are accurately identified.
  • the inputting the received initial picture into the first detection model to obtain the initial positions of one or more target objects in the initial picture includes:
  • inputting the target picture corresponding to the target position into the recognition model to obtain the target object in the initial picture includes:
  • the initial picture is several video frames of the video to be processed
  • one or more target objects in each initial picture are identified based on the target object identification method of the present application.
  • the candidate picture corresponding to the position is input into the second detection model, and the verification object in the candidate picture and the verification position of the verification object in the candidate picture are obtained; based on the verification position, the initial position of each target object is adjusted to obtain each The target position of the target object; input the target picture corresponding to the target position into the recognition model to obtain all target objects in the first initial picture; in the case of obtaining all target objects in the first initial picture, determine the first Whether the initial picture is the last initial picture in the video to be processed, that is, whether i is greater than n, if so, count all the target objects in the first initial picture; if not, input the second initial picture into the first detection
  • the model continues to perform the above steps, and keeps looping until all the target objects in each initial picture formed by the video frames extracted from the video to be processed are all identified.
  • count all the target objects in each initial picture For example, follow the above example, count the scores in each game picture, such as individual score, team score and kill score and so on.
  • the target object identification method can not only identify one or more target objects in a single initial picture, but also can identify one or more target objects in each initial picture formed by key frames extracted from the video to be processed. Multiple target objects are identified. For the identification of the target object in each key frame, multiple detections and the use of logo-assisted positioning are used to locate the scoring area more accurately, identify the positioning, improve the precise position of the target object, and realize the realization of the target object. Pixel-level control; and a lightweight network model is used to quickly extract and identify the target objects in the video for complex and diverse videos of different versions.
  • the target object may not be included in the video frames extracted from the video to be processed.
  • the video frame at the beginning of the game video is an introduction to the game. Since the game has not yet started, these videos
  • the target object does not exist in the frame: the score.
  • each initial picture After receiving each initial picture, it will detect whether each initial picture contains the target object, if the initial picture does not contain the target object, continue to the next initial picture.
  • the target object is detected, avoiding subsequent operations such as location acquisition and target object recognition when the initial image does not contain the target object, which wastes system processing resources and causes a bad experience to the user.
  • the recognition model may identify the background picture that blocks the target object in an initial picture, then the specific target object cannot be correctly identified.
  • the target object in each initial picture is identified, the target object of the previous initial picture of the initial picture can be replaced.
  • the inputting the target picture corresponding to the target position into the recognition model to obtain one or more target objects in the i-th initial picture including:
  • the target picture corresponding to the target position is input into the recognition model, if the picture background of the i-th initial picture does not meet the predetermined condition, then one or more target objects in the i-1-th initial picture are used as the One or more target objects in the i initial pictures.
  • the predetermined conditions include, but are not limited to, the target object can be detected or not blocked, and the like.
  • the game character blocks the scoring area during the process of traveling.
  • the corresponding score position will be The picture is input into the recognition model, and the recognition model can only identify the occluded game character, that is, the picture background, then the score in the initial picture immediately adjacent to the initial picture is used as the score of the initial picture.
  • the target objects between the two initial pictures are generally not very different. If the target objects are all identified, in the case that the recognition model cannot identify the target object in an initial picture, the target object in the immediately preceding initial picture will be replaced to satisfy the subsequent target object based on each initial picture. Need for practical applications such as key initial picture acquisition based on score.
  • the method further includes:
  • one or more target objects in the i-th initial picture do not meet the preset target object recognition rule, use one or more target objects in the i-1-th initial picture as the One or more target objects in the ith initial picture.
  • the preset target object recognition rules may be set according to specific application scenarios, which are not limited in this application.
  • the target objects are individual scores and team scores
  • the preset target object recognition rules may include: Team scores are greater than individual scores.
  • the corresponding one or more target objects in the second initial picture target objects as one or more target objects in the third initial picture.
  • the initial picture of the game video will include the individual score and the team score, and the team score is bound to be greater than the individual score. If the individual score identified in the third initial picture is greater than the team score , then it can be determined that the individual score and team score in the third initial picture are incorrectly identified. In this case, the individual score and team score in the third initial picture need to be corrected, and the individual score in the second initial picture needs to be corrected. and team score as individual score and team score for the 3rd initial picture.
  • the accuracy of the target object identified in each initial picture is verified based on preset target object recognition rules. If the identified target object does not meet the preset target object recognition rules, it can be determined that the target object recognition in the initial picture fails; at this time, in order to ensure the accuracy of the target object in each initial picture identified, you can The target object in the initial picture that fails to be recognized is replaced with the target object recognition result in the previous initial picture of the initial picture.
  • the scores in the game video and the initial pictures extracted from the game videos are continuous, that is, within a certain time window, the scores in the initial pictures will not have a great difference. If the score in the initial picture fluctuates greatly within a certain time window, the score in the initial picture may be identified incorrectly.
  • the score is corrected, and the specific implementation method is as follows:
  • the method further includes:
  • the target objects that do not meet the preset target object arrangement rules in each object sequence are used as adjustment objects;
  • the adjustment object is adjusted based on one or more target objects in initial pictures adjacent to the initial picture corresponding to the adjustment object.
  • the preset time period may be set according to actual needs, for example, the preset time period may be 5 seconds, 10 seconds, and the like.
  • the target objects in all initial pictures within every 5 seconds are obtained at intervals of 5 seconds, the target objects in all initial pictures within every 5 seconds are obtained, and then all initial pictures within every 5 seconds are judged. Whether the arrangement of the target objects in the picture satisfies the preset target object arrangement rules, if so, determine that the target objects in all the initial pictures within the 5 seconds are accurate; if not, determine that the target objects in all the initial pictures within the 5 seconds are accurate There is an abnormal target object; then, the target object in the initial picture with the abnormal target object can be adjusted according to the initial picture with the abnormal target object and the target object of the adjacent initial picture, wherein the median filter can be used. way to adjust it.
  • the preset target object arrangement rule may be set according to a specific application scenario, which is not limited in this application.
  • the preset target object arrangement rule may include that the target objects are incremented or maintained according to the time of the game constant.
  • the individual scores and team scores in the initial picture either remain unchanged or increase with time, such as a preset time period. is 5 seconds, and the scores of the obtained initial pictures within 5 seconds are arranged as [5, 5, 8, 5, 5].
  • a preset time period is 5 seconds
  • the scores of the obtained initial pictures within 5 seconds are arranged as [5, 5, 8, 5, 5].
  • the score is corrected to 5 by means of median filtering.
  • the accuracy of the target object in each initial picture can be judged based on the change characteristics of the target object in different application scenarios, and the When there is an error in the identification of the target object in an initial picture, it can be corrected based on the median filter to further ensure the accuracy of the target object in the initial picture.
  • the target object includes a first target object and a second target object
  • the target object includes a first target object and a second target object
  • the method further includes:
  • the first target object in each initial picture is displayed.
  • the target object is the score
  • the first target object is the individual score
  • the second target object is the team score
  • the user pays more attention to the moment of individual killing.
  • the team kill score also increases accordingly. Therefore, it is possible to comprehensively judge the individual kill score and team kill in each initial picture. Score, filter out the misidentification of the score, and ultimately return the correct individual kill information to the user.
  • the accuracy of the target object concerned by the user can be identified through the association relationship between a certain target object and other target objects in the initial picture, so as to determine the accuracy of the target object concerned by the user. Filtering and adjusting the wrongly identified target objects, and finally displaying the accurate and user-focused target objects, can greatly improve the user's experience.
  • the method further includes:
  • a target video is generated based on the target initial picture, and the target video is sent to the user.
  • the preset extraction condition may be set according to the actual application, for example, the preset extraction condition is that the target object is greater than or equal to the preset target object threshold, that is, how many points the score is greater than.
  • the preset extraction condition is that the personal score is greater than 80 points, then the initial pictures with the personal score greater than 80 points are extracted from the initial pictures as the target pictures, and then these target pictures are generated to generate a video collection to recommend to the user.
  • a target video of interest to the user can be generated based on the target object according to actual application requirements, so as to increase the user's attention to the video.
  • the target object recognition method is further described, and specifically comprises the following steps:
  • Step 402 Input the game video.
  • Step 404 Extract multiple key frames from the game video as initial pictures based on preset extraction rules.
  • Step 406 Perform score area detection and localization on each initial picture.
  • the score area detection is the key area detection
  • the key area refers to the area containing important information that the user often pays attention to in the picture or video.
  • the key area refers to the area containing important information that the user often pays attention to in the picture or video.
  • the key area refers to the area containing important information that the user often pays attention to in the picture or video.
  • the poster pictures and videos of game games and real games sports and real games (football games, basketball games)
  • users tend to pay attention to specific score areas. Since the resolution, size, layout and interface of current pictures or videos are often quite different, it is difficult to use the template matching in the prior art to locate key regions.
  • this application uses a lightweight detection model based on the Mobilenet-based SSD model.
  • Mobilenet is a lightweight network suitable for mobile terminals, and SSD, as a one-stage detection network model, is faster than two-stage networks such as Faster-RCNN.
  • the lightweight detection model is faster, but the accuracy is often affected to a certain extent, and there may be deviations in the positioning of the score area, which will affect the subsequent score number recognition.
  • the present application further detects the iconic Logo in each picture or video frame, and in videos of different interfaces, the position of the Logo is used to assist the positioning of the score area. Through the detection and auxiliary positioning of the score area, the specific position of each score area that needs to be recognized in the picture or video is obtained, and then passed to the subsequent recognition model for score recognition.
  • Step 408 Determine whether the game starts, if yes, go to Step 410 , if not, go to Step 406 .
  • the server determines whether the game starts by detecting the score area in each initial picture. For example, if the score area is not detected in the initial picture, it can be determined that the game has not started. If the score area is detected in the initial picture , then it can be determined that the game starts, and at this time, the score in the initial picture of the game can be identified.
  • Step 410 Identify the score of the score area in each initial picture in sequence.
  • this application uses a multi-label classification model.
  • the backbone network of the classification model can use a lightweight network suitable for mobile terminals, such as Mobilenet, etc., and the classification label contains the number of digits and the specific category (0-9) of each digit.
  • the recognition model will output [2, 2, 1], where the first digit represents the score with two digits, the second digit represents the score with the first digit of 2, and the second digit of 1
  • the second digit representing the score is 1.
  • Step 412 Determine whether the score can be identified, if yes, go to Step 410, if not, go to Step 406.
  • step 414 is executed.
  • Step 414 Perform post-processing on the scores identified in all initial pictures.
  • post-processing such as background filtering, rule filtering, median filtering, and comprehensive score judgment can be performed to determine the accuracy of the identified scores.
  • the score post-processing is mainly aimed at the scene of target object recognition in the video. Due to the complex video background, it is difficult to guarantee 100% accuracy of score recognition.
  • the present invention also proposes a corresponding post-processing method:
  • Rule filtering In game competitions or real competitions, the score usually contains certain rules, such as the team score is greater than the individual score, etc. For different scenarios, when a certain score does not conform to the rules of the game game or the actual game itself, it can be considered that the score of the frame is incorrectly recognized, and the incorrect score can be replaced by the score of the previous frame.
  • median filtering Due to the continuous change of scores in game games or real games, median filtering can also be used to filter the scores, that is, within a certain time window, the median of adjacent scores is used to replace the original score. This method can effectively filter single outliers and smooth the overall score. For example, when the time window is 5 and the adjacent scores are [5, 5, 8, 5, 5], the score of 8 misidentified in the third frame will pass the median value. The filter correction is 5.
  • the above post-processing algorithms will also be adjusted accordingly according to different game competitions or real competitions and different concerns, such as changing the filtering rules, changing the median filter window size, and so on.
  • the important information contained in the picture or video can be determined by judging the score of the picture or the score of the adjacent time period of the video; performed a kill operation), returned to the user or displayed directly.
  • Step 416 Determine a target picture based on the scores identified in all the initial pictures, and generate highlights according to the target picture.
  • the target object recognition method provided in this application uses a lightweight neural network model to replace the template matching in the prior art to detect the scoring area in the video or picture, and also proposes a specific logo position to assist the accurate scoring area. Positioning, in the face of different scenes and different layouts of pictures or videos, the scoring area can be extracted quickly and accurately; in addition, when recognizing the number of the scoring area, the number of the scoring area can also be identified based on the lightweight neural network model. for precise identification.
  • FIG. 5 shows a schematic structural diagram of a target object recognition apparatus provided by an embodiment of the present application.
  • the device includes:
  • the initial position determination module 502 is configured to input the received initial picture into the first detection model, and obtain the initial position of one or more target objects in the initial picture;
  • the verification position determination module 504 is configured to input the candidate picture corresponding to the initial position into the second detection model, to obtain the verification object in the candidate picture and the verification position of the verification object in the candidate picture;
  • a target position determination module 506 configured to adjust the initial positions of the one or more target objects based on the verified positions to obtain target positions of the one or more target objects;
  • the target object obtaining module 508 is configured to input the target picture corresponding to the target position into the recognition model to obtain one or more target objects in the initial picture.
  • the device further includes:
  • the picture acquisition module is configured to receive the video to be processed, and extract i video frames from the video to be processed as initial pictures based on preset extraction rules, where i ⁇ [1,n], and i is a positive integer.
  • the initial position determination module 502 is further configured to:
  • the target object obtaining module 508 is further configured to:
  • the initial position determination module 502 is further configured to:
  • the initial position determination module 502 is further configured to:
  • the target picture corresponding to the target position is input into the recognition model, if the picture background of the i-th initial picture does not meet the predetermined condition, then one or more target objects in the i-1-th initial picture are used as the One or more target objects in the i initial pictures.
  • the device further includes:
  • the first object adjustment module is configured to, in the case that one or more target objects in the i-th initial picture does not meet the preset target object recognition rule, change one of the i-1-th initial pictures or multiple target objects, as one or more target objects in the i-th initial picture.
  • the device further includes:
  • the second object adjustment module configured as:
  • the target objects that do not meet the preset target object arrangement rules in each object sequence are used as adjustment objects;
  • the adjustment object is adjusted based on one or more target objects in initial pictures adjacent to the initial picture corresponding to the adjustment object.
  • the device further includes:
  • the target object includes a first target object and a second target object
  • the third object adjustment module is configured to:
  • the first target object in each initial picture is displayed.
  • the device further includes:
  • the target video generation module configured as:
  • a target video is generated based on the target initial picture, and the target video is sent to the user.
  • the target object recognition device achieves pixel-level control of the target object through multiple detections and the use of logo-assisted positioning, more accurate positioning of the scoring area, identification and positioning, and improvement of the precise position of the target object;
  • the lightweight network model can quickly extract and recognize the target objects in the pictures or videos for complex, diverse and different versions of various pictures or videos.
  • the above is a schematic solution of a target object recognition apparatus according to this embodiment. It should be noted that the technical solution of the target object recognition device and the technical solution of the above-mentioned target object recognition method belong to the same concept. Description of the technical solution.
  • FIG. 6 shows a structural block diagram of a computing device 600 provided according to an embodiment of the present specification.
  • Components of the computing device 600 include, but are not limited to, memory 610 and processor 620 .
  • the processor 620 is connected with the memory 610 through the bus 630, and the database 650 is used for saving data.
  • Computing device 600 also includes access device 640 that enables computing device 600 to communicate via one or more networks 660 .
  • networks include a public switched telephone network (PSTN), a local area network (LAN), a wide area network (WAN), a personal area network (PAN), or a combination of communication networks such as the Internet.
  • Access device 640 may include one or more of any type of network interface (eg, network interface card (NIC)), wired or wireless, such as IEEE 802.11 wireless local area network (WLAN) wireless interface, World Interoperability for Microwave Access ( Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, and the like.
  • NIC network interface card
  • computing device 600 may also be connected to each other, such as through a bus.
  • bus may also be connected to each other, such as through a bus.
  • FIG. 6 the structural block diagram of the computing device shown in FIG. 6 is only for the purpose of example, rather than limiting the scope of this specification. Those skilled in the art can add or replace other components as required.
  • Computing device 600 may be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (eg, tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (eg, smart phones) ), wearable computing devices (eg, smart watches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or PCs.
  • Computing device 600 may also be a mobile or stationary server.
  • the processor 620 is configured to execute the following computer-executable instructions, and when the processor 620 executes the instructions, the steps of the target object recognition method are implemented.
  • the above is a schematic solution of a computing device according to this embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned target object recognition method belong to the same concept, and the details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the above-mentioned target object recognition method. .
  • An embodiment of the present application further provides a computer-readable storage medium, which stores computer instructions, and when the instructions are executed by a processor, implements the steps of the aforementioned target object identification method.
  • the above is a schematic solution of a computer-readable storage medium of this embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above-mentioned target object identification method belong to the same concept, and the details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned target object identification method. .
  • An embodiment of the present application further provides a computer program product, which, when the computer program product is executed in a computer, causes the computer to execute the steps of the aforementioned target object identification method.
  • the computer instructions include computer program product code, which may be in source code form, object code form, an executable file, some intermediate form, or the like.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program product code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) ), random access memory (RAM, RandomAccess Memory), electrical carrier signals, telecommunication signals, and software distribution media, etc.
  • the content contained in the computer-readable media may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, the computer-readable media Electric carrier signals and telecommunication signals are not included.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供目标对象识别方法及装置,其中,所述方法包括将接收的初始图片输入第一检测模型,获得所述初始图片中一个或多个目标对象的初始位置;将所述初始位置对应的候选图片输入第二检测模型,获得所述候选图片中的验证对象以及所述验证对象在所述候选图片中的验证位置;基于所述验证位置对所述一个或多个目标对象的初始位置进行调整,以获得所述一个或多个目标对象的目标位置;将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的一个或多个目标对象。所述方法采用检测模型对不同场景的图片或者视频中的目标对象进行快速、精确的提取,还可以基于验证位置辅助目标对象的最终位置定位,从而获得目标对象精确的识别结果。

Description

目标对象识别方法及装置
本申请要求于2020年12月22日提交中国专利局、申请号为202011529196.5、发明名称为“目标对象识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种目标对象识别方法。本申请同时涉及一种目标对象识别装置,一种计算设备,一种计算机可读存储介质以及一种计算机程序产品。
背景技术
随着互联网和移动设备(例如手机、平板电脑等)的普及发展,在移动设备上进行各种娱乐活动的需求与日俱增。人们会在手机或其他移动设备上浏览网页、观看视频、参与游戏等。用户常常希望可以剪辑出游戏、比赛中与自身有关的精彩时刻,如击杀、助攻等。而视频网站也希望可以在识别出如进球或其他重要信息,便于展示和吸引用户。为了更方便图片和视频的处理,需要对图片或者视频中的重要信息(例如比赛的得分)进行识别。而现有技术中对图片或者视频中的重要信息进行识别的方法不能很好的适配各种应用场景,且识别精确度较差。
发明内容
有鉴于此,本申请实施例提供了一种目标对象识别方法。本申请同时涉及一种目标对象识别装置,一种计算设备,以及一种计算机可读存储介质,以解决现有技术中存在的对图片或者视频中的重要信息识别精确度差的技术缺陷。
根据本申请实施例的第一方面,提供了一种目标对象识别方法,包括:
将接收的初始图片输入第一检测模型,获得所述初始图片中一个或多个目标对象的初始位置;
将所述初始位置对应的候选图片输入第二检测模型,获得所述候选图片中的验证对象以及所述验证对象在所述候选图片中的验证位置;
基于所述验证位置对所述一个或多个目标对象的初始位置进行调整,以获得所述一个或多个目标对象的目标位置;
将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的一个或多个目标对象。
根据本申请实施例的第二方面,提供了一种目标对象识别装置,包括:
初始位置确定模块,被配置为将接收的初始图片输入第一检测模型,获得所述初始图片中一个或多个目标对象的初始位置;
验证位置确定模块,被配置为将所述初始位置对应的候选图片输入第二检测模型,获得所述候选图片中的验证对象以及所述验证对象在所述候选图片中的验证位置;
目标位置确定模块,被配置为基于所述验证位置对所述一个或多个目标对象的初始位置进行调整,以获得所述一个或多个目标对象的目标位置;
目标对象获得模块,被配置为将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的一个或多个目标对象。
根据本申请实施例的第三方面,提供了一种计算设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机指令,所述处理器执行所述指令时实现所述目标对象识别方法的步骤。
根据本申请实施例的第四方面,提供了一种计算机可读存储介质,其存储有计算机指令,该指令被处理器执行时实现所述目标对象识别方法的步骤。
根据本申请实施例的第五方面,提供了一种计算机程序产品,当所述计算机程序产品在计算机中执行时,令计算机执行如前所述目标对象识别方法的步骤。
本申请提供的所述目标对象识别方法及装置,其中,所述目标对象识别方法包括将接收的初始图片输入第一检测模型,获得所述初始图片中一个或多个目标对象的初始位置;将所述初始位置对应的候选图片输入第二检测模型,获得所述候选图片中的验证对象以及所述验证对象在所述候选图片中的验证位置;基于所述验证位置对所述一个或多个目标对象的初始位置进行调整,以获得所述一个或多个目标对象的目标位置;将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的一个或多个目标对象。具体的,所述目标对象识别方法采用预先训练的轻量级神经网络的第一检测模型和第二检测模型,可以对不同场景不同版式的图片或者视频中的目标对象进行快速、精确的提取,并且还可以基于验证位置辅助目标对象的最终位置定位,从而通过识别模型获得目标对象更加精确的识别结果。
附图说明
图1是本申请一实施例提供的一种目标对象识别方法的具体应用结构示 意图;
图2是本申请一实施例提供的一种目标对象识别方法的流程图;
图3是本申请一实施例提供的一种目标对象识别方法中初始图片的示意图;
图4是本申请一实施例提供的一种目标对象识别方法应用在游戏比赛场景的流程图;
图5是本申请一实施例提供的一种目标对象识别装置的结构示意图;
图6是本申请一实施例提供的一种计算设备的结构框图。
具体实施方式
在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本申请内涵的情况下做类似推广,因此本申请不受下面公开的具体实施的限制。
在本申请一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请一个或多个实施例。在本申请一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本申请一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本申请一个或多个实施例中可能采用术语第一、第二等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请一个或多个实施例范围的情况下, 第一也可以被称为第二,类似地,第二也可以被称为第一。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
首先,对本申请一个或多个实施例涉及的名词术语进行解释。
模板匹配:模板匹配是在一幅图片中寻找与另一幅模板图片最相似部分的技术,通常通过滑窗等传统图片处理方法。
目标检测:通过模板匹配或者神经网络,找出图片中所有感兴趣的目标。
OCR:Optical Character Recognition,光学字符识别,指对文本资料的图片文件进行分析识别处理,获取文字及版面信息的过程。
MobileNet:轻量化网络。
SSD:Single ShotMultiBox Detector,目标检测算法。
Faster-RCNN:一种CNN(Convolutional Neural Networks,卷积神经网络)网络目标检测方法,一个完全end-to-end的CNN对象检测模型。
Logo:一种设计的名称,指的是商品、企业、网站等为自己主题或者活动等设计标志的一种行为。
现有技术中,在游戏图片或者游戏视频中,通常需要对游戏角色和击杀提示等重要信息进行提取,由于用户常常更关心涉及自身的信息,为了定位是否为该类信息,也常需要识别其他关键部分,例如用户自身使用的游戏人物等,通常会采用模板匹配、分类算法等对重要信息进行提取;但是,基于游戏角色和击杀提示的识别,一般应用于游戏图片或者游戏视频,这种识别方法在初始使用时就需要得到全部角色的特征用于区分不同的信息,由于游戏更新引入新 角色或者皮肤较为频繁,会导致需要经常更新识别模型,极大的增加了模型更新以及人工成本,若模型没有及时更新,则游戏角色和击杀识别的准确度极低。
在其他比赛类的图片或者视频中,有针对得分等重要信息的提取,通常也会采用模板匹配的方法对得分区域进行识别,通过得分的变化,定位到图片或者视频中的重要信息;而与上述的角色和击杀提示识别不同,对于比赛的得分识别,可以应用在游戏、比赛等更多样的海报和视频中,且不需要搜集角色信息调整模型,然而目标常用的模板匹配方法面对版式较为复杂的图片或者视频,则很难精确的定位到得分,且对重要信息识别时经常会由于背景复杂造成错误的识别,用户体验较差。
基于此,在本申请中,提供了一种目标对象识别方法,本申请同时涉及一种目标对象识别装置,一种计算设备,以及一种计算机可读存储介质,在下面的实施例中逐一进行详细说明。
参见图1,图1示出了根据本申请一实施例提供的一种目标对象识别方法的具体应用结构示意图。
具体的,本申请实施例提供的视频处理方法应用在电脑、服务器或者云端服务上。图1的应用场景中包括CPU(Central Processing Unit,中央处理器)/GPU(Graphics Processing Unit,图形处理器)101、数据存储模块103、预处理模块105、得分区域检测模块107、数字识别模块109以及信息提取模型111;具体的,CPU/GPU101开始工作,获取数据存储模块103中存储的待处理视频或者图片,然后控制预处理模块105提取需要识别的待处理视频中的关键帧,并按照得分区域检测模块107的输入要求进 行图片或者关键帧的预处理;然后将图片或者关键帧输入得分区域检测模块107,得分区域检测模块107对图片或者关键帧中的得分区域进行检测和辅助定位;再将图片或者关键帧中最终的得分区域输入到数字识别模块109,数字识别模块109对得分区域中的数字进行检测,以识别出得分区域的数字;最终将得分区域的数字输入信息提取模块111,信息提取模块111对识别出的得分区域的数字进行后处理,得到图片或者待处理视频的整体得分识别结果,且对该整体的得分识别结果进行结构化处理,以展示、推荐给用户。
本申请提供的所述目标对象识别方法,通过轻量级神经网络模型代替现有技术中的模板匹配对视频或者图片中的得分区域进行检测,同时还提出了通过特定的logo位置辅助得分区域的精确定位,面对不同场景不同版式的图片或者视频可以对得分区域实现快速、精确的得分区域提取;此外,在对得分区域的数字识别时,也可以基于轻量级神经网络模型对得分区域的数字进行精确的识别。
参见图2,图2示出了根据本申请一实施例提供的一种目标对象识别方法的流程图,具体包括以下步骤:
步骤202:将接收的初始图片输入第一检测模型,获得所述初始图片中一个或多个目标对象的初始位置。
具体的,本申请提供的所述目标对象识别方法可以应用于游戏场景中,对游戏得分进行识别;也可以应用于娱乐比赛场景中,对比赛得分进行识别;还可以应用于其他需要对得分进行识别的应用场景中,本申请对此不做任何限定。 为了便于理解,以下本申请实施例均以所述目标对象识别方法应用于游戏场景中,对游戏比赛中的得分进行识别进行详细介绍。
其中,初始图片包括但不限于任何类型、任何内容的图片;例如游戏图片、比赛图片,又或者是视频中的视频帧形成的图片等,且目标对象可以理解为得分。
具体实施时,将接收的初始图片输入第一检测模型,通过该第一检测模型获得该初始图片中一个或多个得分的初始位置;其中,第一检测模型包括但不限于基于MobileNet的SSD模型,其中,MobileNet是适用与移动端的轻量级网络,而SSD作为一阶段检测网络,相比Faster-RCNN等两阶段检测网络的速度更加快速。
参见图3,图3示出了根据本申请一实施例提供的一种目标对象识别方法中初始图片的示意图。
图3为游戏场景中的游戏图片,该游戏图片中包括游戏比赛的得分,例如个人得分,团队得分等。
实际应用中,将该游戏图片输入到基于MobileNet的SSD模型中,可以获得该游戏图片中各种得分的初始位置,例如图3中的得分1的初始位置1、得分2的初始位置2、得分3的初始位置3和得分4的初始位置4等。
步骤204:将所述初始位置对应的候选图片输入第二检测模型,获得所述候选图片中的验证对象以及所述验证对象在所述候选图片中的验证位置。
其中,第一检测模型和第二检测模型可以为相同类型的检测模型和不同类型的检测模型,实际应用中,第一检测模型是为了对得分位置进行识别,因此 在进行模型训练时,第一检测模型采用的训练样本即为游戏图片,而对应的标签则为游戏图片中得分的位置;而实际应用中,每个得分旁边一般会有一个logo图标,用于表示得分的含义,例如是个人得分,团队得分或者是击杀得分等,因此第二检测模型则是为了对第一检测模型输出的游戏图片中得分位置旁边的logo图标进行识别,那么在进行模型训练时,第二检测模型采用的训练样本即为包含得分的游戏图片,对应的标签则为每个得分对应的logo图标以及logo图标的位置。
具体实施时,在获取每个目标对象的初始位置的情况下,将每个目标对象的初始位置对应的候选图片输入第二检测模型,通过第二检测模型获得该候选图片中与目标对象对应的logo图标以及与目标对象对应的logo图标的验证位置。
仍以图3为例,在获得得分1的初始位置1、得分2的初始位置2、得分3的初始位置3和得分4的初始位置4之后,将初始位置1、初始位置2、初始位置3和初始位置4对应的图3中的图片输入到第二检测模型,通过第二检测模型可以获得初始位置1的区域中“VS”以及“VS”的位置、初始位置2的区域中“刀”以及“刀”的位置、初始位置3的区域中“圆圈”以及“圆圈”的位置和初始位置4的区域中“拳头”以及“拳头”的位置。
步骤206:基于所述验证位置对所述一个或多个目标对象的初始位置进行调整,以获得所述一个或多个目标对象的目标位置。
具体的,在获得验证位置之后,利用验证位置对每个对应的目标对象的初始位置进行调整,以获得该目标对象的目标位置,使得后续对目标对象进行识 别时,可以基于准确的目标位置对目标对象进行准确的识别。
实际应用中,验证位置则为与目标对象相邻的logo图标的位置,因此可以基于该logo图标的位置对每个对应的目标对象的目标位置进行调整;具体的,利用logo的位置和得分区域的初始位置来定位得分的具体目标位置,这样相比通过第一检测模型识别的较大的得分区域的初始位置,再进行切割会更加准确快速,而准确的得分位置的确定有利于后续对得分位置中数字的识别。
步骤208:将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的一个或多个目标对象。
其中,识别模型包括但不限于多标签的分类模型,其中,该多标签的分类模型可以使用适合移动端的轻量级网络,如MobileNet,而分类模型输出的分类标签包含数字位数以及每个数字的具体类别(0~9)。举例来说,若目标位置中目标对象为得分21,将该目标位置包含目标对象的目标图片输入识别模型,识别模型则会输出【2,2,1】,其中,第一位2代表得分为两位数,第二位2代表得分的第一位数字为2,第二位1代表得分的第二位数字为1。采用此种多标签的分类模型,即可以对位数不定的数字进行识别,还可以识别不包含数字的背景类别,那么在识别不包含数字的背景类别时,得分的位数为0。
具体实施时,一个初始图片中会包括多个目标对象,每个目标对象均对应一个初始位置,然后将每个目标对象的初始位置对应的候选图片输入到第二检测模型中,可以获得该目标对象的初始位置对应的候选图片中的验证对象以及该验证对象在候选图片中的验证位置;再基于该验证位置对每个对应的目标对象的初始位置进行准确性调整,以获得每个目标对象的目标位置;最后将每个 目标位置对应的目标图片输入识别模型,可以获得每个目标位置对应的目标图片中的目标对象,将所有的目标位置对应的目标图片中的目标对象汇总在一起,即可以确定初始图片中的所有目标对象。
本说明书实施例中,所述目标对象识别方法通过多次检测与采用logo辅助定位,更精确的定位得分区域,标识定位,提升目标对象的精确位置,实现对目标对象实现像素级别控制;且采用轻量级网络模型,针对复杂多样、不同版本的各类图片或者视频,对图片或者视频中的目标对象实现移动端快速的提取和识别。
实际应用中,初始图片可以是视频中的视频帧,那么在初始图片为视频中的视频帧的情况下,所述将接收的初始图片输入第一检测模型之前,还包括:
接收待处理视频,基于预设提取规则从所述待处理视频中提取i个视频帧作为初始图片,其中,i∈[1,n],且i为正整数。
其中,预设提取规则可以根据实际应用进行设置,例如将每隔一秒、两秒或者三秒提取一个视频帧作为初始图片,或者是对使用视频帧打分模型对视频中的每个视频帧进行打分,将得分高的视频帧作为初始图片等等。
具体的,在将接收的初始图片输入第一检测模型之前,接收待处理视频,然后基于预设提取规则从待处理视频中提取若i个视频帧作为初始图片,其中,i属于1到n,且i为正整数,例如n为5,那么基于预设提取规则从待处理视频中提取5个视频帧作为初始图片。
本说明书实施例中,所述目标对象识别方法可以应用在对视频的目标对象识别中,将待处理视频中的某些视频帧作为初始图片,以实现对该待处理视频 的视频帧中的目标对象进行准确识别。
本说明书另一实施例中,所述将接收的初始图片输入第一检测模型,获得所述初始图片中一个或多个目标对象的初始位置,包括:
将接收的第i个初始图片输入第一检测模型,获得所述第i个初始图片中一个或多个目标对象的初始位置;
相应的,所述将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的目标对象,包括:
将所述目标位置对应的目标图片输入识别模型,获得所述第i个初始图片中的一个或多个目标对象;
判断i是否大于n,若是,则统计出每个初始图片中的一个或多个目标对象,
若否,则将i自增1,继续执行将接收的第i个初始图片输入第一检测模型。
其中,在初始图片为待处理视频的若干个视频帧的情况下,则基于本申请的目标对象识别方法对每个初始图片中的一个或多个目标对象进行识别。
以i为1为例,首先将接收的第1个初始图片输入第一检测模型,获得第1个初始图片中每个目标对象的初始位置;将第1个初始图片中每个目标对象的初始位置对应的候选图片输入第二检测模型,获得该候选图片中的验证对象以及验证对象在该候选图片中的验证位置;基于该验证位置对每个目标对象的初始位置进行调整,以获得每个目标对象的目标位置;将目标位置对应的目标图片输入识别模型,以获得第1个初始图片中的所有目标对象;在获得第1 个初始图片中的所有目标对象的情况下,判断第1个初始图片是否是待处理视频中的最后一个初始图片,即i是否大于n,若是,则统计出第1个初始图片中的所有目标对象;若否,则将第2个初始图片输入第一检测模型,继续执行上述步骤,一直循环,直到待处理视频中提取的视频帧形成的每个初始图片中的所有目标对象全部被识别出来为止。
在所有的初始图片中的目标对象全部被识别出来之后,统计出每个初始图片中的所有目标对象;例如沿用上例,统计每个游戏图片中的得分,如个人得分、团队得分以及击杀得分等等。
具体的,对待处理视频中提取的每个视频帧形成的初始图片中的一个或多个目标对象的具体识别方式,可以参见上述实施例中对单个初始图片中目标对象的具体识别步骤,在此不作赘述。
本说明书实施例中,所述目标对象识别方法不仅可以对单张初始图片中的一个或多个目标对象进行识别,还可以对待处理视频中提取的关键帧形成的每个初始图片中的一个或多个目标对象进行识别,对于每个关键帧中目标对象的识别均通过多次检测与采用logo辅助定位,更精确的定位得分区域,标识定位,提升目标对象的精确位置,实现对目标对象实现像素级别控制;且采用轻量级网络模型,针对复杂多样、不同版本的各类视频,对视频中的目标对象实现移动端快速的提取和识别。
此外,所述将接收的第i个初始图片输入第一检测模型,获得所述第i个初始图片中一个或多个目标对象的初始位置,包括:
将接收的第i个初始图片输入第一检测模型;
判断所述第i个初始图片中是否包括目标对象,
若是,则获得所述第i个初始图片中一个或多个目标对象的初始位置,
若否,则将i自增1,继续执行将接收的第i个初始图片输入第一检测模型。
实际应用中,从待处理视频中提取的视频帧中有可能会不包括目标对象,例如游戏视频中,游戏视频的开始时的视频帧是关于游戏的介绍,由于游戏还没有开始,那么这些视频帧中就不存在目标对象:得分。
那么为了避免做无效的工作,在接收到每个初始图片之后,均会对每个初始图片中是否包含目标对象进行检测,若该初始图片中不包含目标对象,则继续对下一个初始图片中的目标对象进行检测,避免在初始图片中不包含目标对象的情况下,也进行位置获取、目标对象识别等后续操作,浪费系统处理资源,给用户造成不好的体验。
具体实施时,当某个初始图片中的目标对象被遮挡的情况下,则识别模型可能识别出来的是遮挡某个初始图片中目标对象的背景图片,那么具体的目标对象就无法正确识别出来,而为了满足实际应用,对每个初始图片中的目标对象均识别出来,则可以将该初始图片的上一个初始图片的目标对象进行代替,具体实现方式如下所述:
所述将所述目标位置对应的目标图片输入识别模型,获得所述第i个初始图片中的一个或多个目标对象,包括:
将所述目标位置对应的目标图片输入识别模型,若所述第i个初始图片的图片背景不满足预定条件,则将第i-1个初始图片中的一个或多个目标对象作 为所述第i个初始图片中的一个或多个目标对象。
其中,预定条件包括但不限于目标对象可被检测或者未被遮挡等。
在游戏场景中,可能会存在游戏角色在行进的过程中对得分区域进行遮挡的情况,那么这种情况下,则可以确定该初始图片的图片背景不满足预定条件,此时将得分位置对应的图片输入到识别模型中,识别模型则仅可以识别出遮挡的游戏角色,即图片背景,那么则将该初始图片的紧邻的上一个初始图片中的得分作为该初始图片的得分。
本说明书实施例中,由于待处理视频的视频帧形成的初始图片之间存在一定的连续性,两个初始图片之间的目标对象差别一般不会很大,那么为了保证每个初始图片中的目标对象均被识别出来,则在识别模型无法识别出某个初始图片中目标对象的情况下,将紧邻的前一个初始图片中的目标对象进行替代,以满足后续基于每个初始图片的目标对象进行实际应用(例如基于得分进行关键初始图片获取)的需要。
此外,所述统计出每个初始图片中的一个或多个目标对象之后,还包括:
在所述第i个初始图片中的一个或多个目标对象不满足预设目标对象识别规则的情况下,将所述第i-1个初始图片中的一个或多个目标对象,作为所述第i个初始图片中的一个或多个目标对象。
其中,预设目标对象识别规则可以根据具体的应用场景进行设置,本申请对此不做任何限定,例如在游戏场景中,目标对象为个人得分和团队得分,那么预设目标对象识别规则可以包括团队得分要大于个人得分。
具体的,以i为3为例,在第3个初始图片中的一个或多个目标对象不满 足预设目标对象识别规则的情况下,将所述第2个初始图片中对应的一个或多个目标对象,作为所述第3个初始图片中的一个或多个目标对象。
沿用上例,仍以游戏场景为例,游戏比赛视频的初始图片中会包括个人得分以及团队得分,而团队得分必然会大于个人得分,若第3个初始图片中识别出的个人得分大于团队得分,则可以确定第3个初始图片中的个人得分和团队得分识别错误,此时,则需要对第3个初始图片中的个人得分和团队得分进行修正,将第2个初始图片中的个人得分和团队得分,作为第3个初始图片的个人得分和团队得分。
本说明书实施例中,在获得待处理视频中每个初始图片的目标对象之后,基于预设目标对象识别规则对每个初始图片中识别出的目标对象进行准确性验证,在某个初始图片中识别出的目标对象不满足预设目标对象识别规则的情况下,则可以确定该初始图片中的目标对象识别失败;此时,为了保证识别的每个初始图片中目标对象的准确性,则可以将识别失败的初始图片中的目标对象以该初始图片的上一个初始图片中的目标对象识别结果进行替代。
本说明书另一实施例中,在游戏或者比赛场景中,游戏视频以及比赛视频提取的初始图片中的比分具有连续性,即在一定的时间窗口内,初始图片中的比分不会有很大的变化,若在一定的时间窗口内,初始图片中的比分存在较大的波动,那么可能识别出的初始图片中的比分可能会出现错误识别的情况,此时,则需要对某个初始图片中的比分进行修正,具体实现方式如下所述:
所述统计出每个初始图片中的一个或多个目标对象之后,还包括:
按照预设时间段将所有初始图片中的一个或多个目标对象划分为至少一 个对象序列;
将每个对象序列中不满足预设目标对象排列规则的目标对象作为调整对象;
基于所述调整对象对应的初始图片相邻的初始图片中的一个或多个目标对象,调整所述调整对象。
其中,预设时间段可以根据实际需要进行设置,例如预设时间段可以为5秒、10秒等。
以预设时间段为5秒为例,则将待处理视频中所有初始图片按照5秒的间隔进行获取,获取每5秒内的所有初始图片中的目标对象,然后判断每5秒内所有初始图片中的目标对象的排列是否满足预设目标对象排列规则,若是,则确定该5秒内所有初始图片中的目标对象准确,若否,则确定该5秒内所有初始图片中的目标对象中存在异常的目标对象;然后就可以根据与存在异常目标对象的初始图片,相邻的初始图片的目标对象对该存在异常目标对象的初始图片中的目标对象进行调整,其中,可以采用中值滤波的方式对其进行调整。
其中,预设目标对象排列规则可以根据具体的应用场景进行设置,本申请对此不做任何限定,例如在游戏场景中,预设目标对象排列规则可以包括目标对象按照比赛的时间进行递增或者保持不变。
沿用上例,游戏场景中,按照本领域人员对正常游戏比赛的理解,随着时间的增加,初始图片中的个人得分以及团队得分要么保持不变,要么随着时间递增,例如预设时间段为5秒,获取的5秒内的初始图片的得分排列为[5,5,8,5,5],此时,则可以确定第3个初始图片中的得分识别可能是不正确的, 需要通过第3个初始图片相邻的第2个初始图片中的目标对象和第4个初始图片中的目标对象,对第3个初始图片中的目标对象进行调整,例如将第3个初始图片中的比分通过中值滤波的方式修正为5。
本说明书实施例中,在通过识别模型获得每个初始图片中的目标对象之后,可以基于不同应用场景中,目标对象的变化特征,对每个初始图片中的目标对象的准确性进行判断,在某个初始图片中的目标对象识别存在错误的情况下,可以基于中值滤波的方式对其进行修正,进一步保证初始图片中目标对象的准确性。
本说明书另一实施例中,所述目标对象包括第一目标对象和第二目标对象;
相应的,所述目标对象包括第一目标对象和第二目标对象;
相应的,所述统计出每个初始图片中的一个或多个目标对象之后,还包括:
接收针对所述第一目标对象的获取请求,基于所述获取请求确定每个初始图片中的第一目标对象和第二目标对象,其中,所述第二目标对象与所述第一目标对象相关联;
在所述每个初始图片中的第二目标对象根据所述第一目标对象的增加进行更新时,将所述每个初始图片中的第一目标对象进行展示。
沿用上例,在游戏场景中,目标对象为得分,第一目标对象为个人得分,第二目标对象则为团队得分;在某个游戏,用户较为关注个人击杀瞬间,当每个初始图片中的个人得分以及团队得分识别过滤后,在每个初始图片中,当个人击杀得分增加时,团队击杀得分也相应增加,因此可以综合判断每个初始图片中个人击杀得分和团队击杀得分,过滤得分的错误识别,最终可以向用户返 回正确的个人击杀信息。
本说明书实施例中,当用户比较关注某个目标对象的情况下,可以通过初始图片中某个目标对象与其他目标对象之间的关联关系,对用户关注的目标对象的准确性进行识别,以过滤调整错误识别的目标对象,最后将准确的且用户关注的目标对象进行展示,可以极大的提升用户的体验度。
本说明书实施例中,所述方法,还包括:
对所述初始图片中的一个或多个目标对象进行提取,将目标对象满足预设提取条件的初始图片作为目标初始图片;
基于所述目标初始图片生成目标视频,且将所述目标视频发送至用户。
其中,预设提取条件可以根据实际应用进行设置,例如预设提取条件为目标对象大于等于预设目标对象阈值等,即比分要大于多少分。
具体实施时,通过上述各种方式对每个初始图片中的一个或多个目标对象进行过滤识别且调整之后,从所有的初始图片中选取一些目标对象满足预设提取条件的初始图片作为目标图片,然后将这些目标图片生成目标视频发送用户。
沿用上例,预设提取条件为个人比分大于80分,那么则从初始图片中提取个人比分大于80分的初始图片作为目标图片,然后将这些目标图片生成视频集锦推荐给用户。
本说明书实施例中,在获得每个初始图片中的一个或多个目标对象后,可以根据实际的应用需求基于目标对象生成用户感兴趣的目标视频等,以提升用户对该视频的关注度。
下述结合附图4,以本申请提供的目标对象识别方法在游戏比赛场景的应 用为例,对所述目标对象识别方法进行进一步说明,具体包括以下步骤:
步骤402:输入游戏视频。
步骤404:基于预设提取规则从游戏视频中提取多个关键帧作为初始图片。
步骤406:对每个初始图片进行比分区域检测和定位。
具体的,比分区域检测即关键区域检测,而关键区域即指用户在图片或视频中常关注的包含重要信息的区域。例如在游戏比赛、现实比赛(足球比赛、篮球比赛)的海报图片、视频中,用户往往会关注具体的比分区域。由于目前图片或视频的分辨率、大小、版式、界面常有较大差异,很难使用现有技术中的模板匹配来进行关键区域定位。并且针对在移动端需要对图片或视频的关键区域实现快速检测的需求,本申请使用了轻量的检测模型,基于Mobilenet的SSD模型。其中Mobilenet是适用与移动端的轻量级网络,而SSD作为一阶段检测网络模型,相比Faster-RCNN等两阶段网络速度更快。
然而轻量级检测模型速度较快,但准确率往往会受到一定影响,可能会出现比分区域定位有偏差,从而会影响到后续的比分数字识别。为了解决这一问题,本申请还进一步检测了每个图片或者视频帧中具有标志性的Logo,在不同界面的视频中,利用Logo的位置来辅助比分区域定位。通过比分区域的检测和辅助定位,得到图片或视频需要识别的各个比分区域的具体位置,然后传入后续识别模型进行比分识别。
步骤408:判断游戏是否开始,若是,则执行步骤410,若否,则继续执行步骤406。
具体实施时,服务器通过对每个初始图片中的比分区域的检测确定游戏是 否开始,例如在初始图片中未检测到比分区域,那么则可以确定游戏没有开始,若在初始图片中检测到比分区域,那么则可以确定游戏开始,此时就可以对游戏比赛的初始图片中的比分进行识别。
步骤410:按照顺序对每个初始图片中比分区域的比分进行识别。
具体的,针对比分识别任务,目前常用的有很多基于LSTM和CTC的字符识别技术。然而,这些技术在移动端使用会耗时较长,影响用户体验。针对轻量级的数字识别的特点,本申请使用了多标签的分类模型。其中,分类模型的主干网络可以使用适合移动端的轻量级网络,如Mobilenet等,而分类标签包含了数字位数以及每位数字具体类别(0~9)。举例来说,对于比分21,识别模型会输出[2,2,1],其中,第一位代表比分为两位数,第二位代表比分的第一位数为2,第二位的1代表比分的第二位数为1。用这种多标签分类方法,既可以对位数不定的数字进行识别,还可以识别出不包含数字的背景类别,当识别出不包含数字的背景类别时,可以将数字位数设为0。
步骤412:判断是否可以识别出比分,若是,则执行步骤410,若否,则执行步骤406。
具体的,在对每个初始图片中比分区域的比分进行识别时,均需要判断是否可以识别出该初始图片中的比分,若是,则继续识别下一个初始图片中的比分,直到所有的初始图片中的比分均被识别出来;若否,则重新对该初始图片中的比分区域进行定位,再次进行比分区域识别,以保证所有初始图片中的比分均被识别出来;在所有初始图片中的比分均被识别出来之后,执行步骤414。
步骤414:对所有初始图片中识别出的比分进行后处理。
具体的,可以参见上述实施例,在获得所有初始图片中的比分之后,可以通过背景过滤、规则过滤、中值滤波以及综合比分判断等后处理,以确定识别出的比分的准确性。
具体实施时,比分后处理主要是针对视频中的目标对象识别的场景,由于视频背景复杂,比分识别很难保证100%准确率。
但在一段视频中,当游戏正在进行时,比分将基本呈连续变化。因此,在综合视频比分后,本发明还提出了相应的后处理方法:
背景过滤:由于数字识别模型会识别比分或背景,当某一帧识别到背景时,该帧的比分区域可能由于被遮挡或其他原因无法识别,则该帧可以沿用上一帧的比分。
规则过滤:在游戏比赛或者现实比赛中,比分通常包含一定的规则,如团队分数大于个人分数等。针对不同的场景,当某个分数不符合游戏比赛或现实比赛自身规则时,则可以认为该帧比分识别有误,此时可以将有误的比分用上一帧比分代替。
中值滤波:由于游戏比赛或现实比赛比分连续变化的特性,还可以采用了中值滤波的方式对比分进行过滤,即在一定时间窗口内,用相邻比分的中位数代替原本的比分。该方法可以有效过滤单个异常值,平滑整体比分,例如当时间窗口为5,相邻比分为[5,5,8,5,5]时,第三帧误识别的比分8将会通过中值滤波修正为5。
综合比分判断:最终当关注的目标比分变化时,该方法会综合各种比分判断并进行结构化输出。如对某个游戏,用户较为关注个人击杀的瞬间,即个人 击杀得分,当比分识别过滤后,个人击杀数增加时,团队击杀数应也相应增加。因此,算法会综合判断个人击杀的得分与团队击杀的得分,过滤错误识别,最终返回正确的个人击杀得分给用户。
实际应用中,根据游戏比赛或现实比赛的不同以及关注点的不同,以上后处理算法也会有相应的调整,例如更改过滤的规则、更改中值滤波窗口大小等等。
最终,当图片或视频分析结束后,通过图片比分或视频相邻时间段比分的判断,可以确定图片或视频中包含的重要信息;然后将信息进行结构化处理后(如用户在某个时间点进行了击杀操作),返回给用户或直接展示。
步骤416:基于所有初始图片中识别出的比分确定目标图片,且根据目标图片生成精彩时刻。
本申请提供的所述目标对象识别方法,通过轻量级神经网络模型代替现有技术中的模板匹配对视频或者图片中的得分区域进行检测,同时还提出了特定的logo位置辅助得分区域的精确定位,面对不同场景不同版式的图片或者视频可以对得分区域实现快速、精确的得分区域提取;此外,在对得分区域的数字识别时,也可以基于轻量级神经网络模型对得分区域的数字进行精确的识别。
与上述方法实施例相对应,本申请还提供了目标对象识别装置实施例,图5示出了本申请一实施例提供的一种目标对象识别装置的结构示意图。如图5所示,该装置包括:
初始位置确定模块502,被配置为将接收的初始图片输入第一检测模型, 获得所述初始图片中一个或多个目标对象的初始位置;
验证位置确定模块504,被配置为将所述初始位置对应的候选图片输入第二检测模型,获得所述候选图片中的验证对象以及所述验证对象在所述候选图片中的验证位置;
目标位置确定模块506,被配置为基于所述验证位置对所述一个或多个目标对象的初始位置进行调整,以获得所述一个或多个目标对象的目标位置;
目标对象获得模块508,被配置为将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的一个或多个目标对象。
可选的,所述装置,还包括:
图片获取模块,被配置为接收待处理视频,基于预设提取规则从所述待处理视频中提取i个视频帧作为初始图片,其中,i∈[1,n],且i为正整数。
可选的,所述初始位置确定模块502,进一步被配置为:
将接收的第i个初始图片输入第一检测模型,获得所述第i个初始图片中一个或多个目标对象的初始位置;
相应的,所述目标对象获得模块508,进一步被配置为:
将所述目标位置对应的目标图片输入识别模型,获得所述第i个初始图片中的一个或多个目标对象;
判断i是否大于n,若是,则统计出每个初始图片中的一个或多个目标对象,
若否,则将i自增1,继续执行将接收的第i个初始图片输入第一检测模型。
可选的,所述初始位置确定模块502,进一步被配置为:
将接收的第i个初始图片输入第一检测模型;
判断所述第i个初始图片中是否包括目标对象,
若是,则获得所述第i个初始图片中一个或多个目标对象的初始位置,
若否,则将i自增1,继续执行将接收的第i个初始图片输入第一检测模型。
可选的,所述初始位置确定模块502,进一步被配置为:
将所述目标位置对应的目标图片输入识别模型,若所述第i个初始图片的图片背景不满足预定条件,则将第i-1个初始图片中的一个或多个目标对象作为所述第i个初始图片中的一个或多个目标对象。
可选的,所述装置,还包括:
第一对象调整模块,被配置为在所述第i个初始图片中的一个或多个目标对象不满足预设目标对象识别规则的情况下,将所述第i-1个初始图片中的一个或多个目标对象,作为所述第i个初始图片中的一个或多个目标对象。
可选的,所述装置,还包括:
第二对象调整模块,被配置为:
按照预设时间段将所有初始图片中的一个或多个目标对象划分为至少一个对象序列;
将每个对象序列中不满足预设目标对象排列规则的目标对象作为调整对象;
基于所述调整对象对应的初始图片相邻的初始图片中的一个或多个目标对象,调整所述调整对象。
可选的,所述装置,还包括:
所述目标对象包括第一目标对象和第二目标对象;
相应的,所述第三对象调整模块,被配置为:
接收针对所述第一目标对象的获取请求,基于所述获取请求确定每个初始图片中的第一目标对象和第二目标对象,其中,所述第二目标对象与所述第一目标对象相关联;
在所述每个初始图片中的第二目标对象根据所述第一目标对象的增加进行更新时,将所述每个初始图片中的第一目标对象进行展示。
可选的,所述装置,还包括:
目标视频生成模块,被配置为:
对所述初始图片中的一个或多个目标对象进行提取,将目标对象满足预设提取条件的初始图片作为目标初始图片;
基于所述目标初始图片生成目标视频,且将所述目标视频发送至用户。
本说明书实施例中,所述目标对象识别装置通过多次检测与采用logo辅助定位,更精确的定位得分区域,标识定位,提升目标对象的精确位置,实现对目标对象实现像素级别控制;且采用轻量级网络模型,针对复杂多样、不同版本的各类图片或者视频,对图片或者视频中的目标对象实现移动端快速的提取和识别。
上述为本实施例的一种目标对象识别装置的示意性方案。需要说明的是, 该目标对象识别装置的技术方案与上述的目标对象识别方法的技术方案属于同一构思,目标对象识别装置的技术方案未详细描述的细节内容,均可以参见上述目标对象识别方法的技术方案的描述。
图6示出了根据本说明书一个实施例提供的一种计算设备600的结构框图。该计算设备600的部件包括但不限于存储器610和处理器620。处理器620与存储器610通过总线630相连接,数据库650用于保存数据。
计算设备600还包括接入设备640,接入设备640使得计算设备600能够经由一个或多个网络660通信。这些网络的示例包括公用交换电话网(PSTN)、局域网(LAN)、广域网(WAN)、个域网(PAN)或诸如因特网的通信网络的组合。接入设备640可以包括有线或无线的任何类型的网络接口(例如,网络接口卡(NIC))中的一个或多个,诸如IEEE802.11无线局域网(WLAN)无线接口、全球微波互联接入(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC)接口,等等。
在本说明书的一个实施例中,计算设备600的上述部件以及图6中未示出的其他部件也可以彼此相连接,例如通过总线。应当理解,图6所示的计算设备结构框图仅仅是出于示例的目的,而不是对本说明书范围的限制。本领域技术人员可以根据需要,增添或替换其他部件。
计算设备600可以是任何类型的静止或移动计算设备,包括移动计算机或移动计算设备(例如,平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如,智能手机)、可佩戴的计算设备(例如,智能手表、智能眼镜等)或其他类型的移动设备,或者诸如台式计算机或PC的静止计算设备。计算设备600还可以是移动式或静止式的服务器。
其中,处理器620用于执行如下计算机可执行指令,所述处理器620执行所述指令时实现所述的目标对象识别方法的步骤。
上述为本实施例的一种计算设备的示意性方案。需要说明的是,该计算设备的技术方案与上述的目标对象识别方法的技术方案属于同一构思,计算设备的技术方案未详细描述的细节内容,均可以参见上述目标对象识别方法的技术方案的描述。
本申请一实施例还提供一种计算机可读存储介质,其存储有计算机指令,该指令被处理器执行时实现如前所述目标对象识别方法的步骤。
上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是,该存储介质的技术方案与上述的目标对象识别方法的技术方案属于同一构思,存储介质的技术方案未详细描述的细节内容,均可以参见上述目标对象识别方法的技术方案的描述。
本申请一实施例还提供一种计算机程序产品,当所述计算机程序产品在计算机中执行时,令计算机执行如前所述目标对象识别方法的步骤。
上述为本实施例的一种计算机程序产品的示意性方案。需要说明的是,该计算机程序产品的技术方案与上述的目标对象识别方法的技术方案属于同一构思,计算机程序产品的技术方案未详细描述的细节内容,均可以参见上述目标对象识别方法的技术方案的描述。
上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程 不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
所述计算机指令包括计算机程序产品代码,所述计算机程序产品代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序产品代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,RandomAccess Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
以上公开的本申请优选实施例只是用于帮助阐述本申请。可选实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施方式。显然,根据本申请的内容,可作很多的修改和变化。本申请选取并具体描述这些实施例, 是为了更好地解释本申请的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本申请。本申请仅受权利要求书及其全部范围和等效物的限制。

Claims (13)

  1. 一种目标对象识别方法,包括:
    将接收的初始图片输入第一检测模型,获得所述初始图片中一个或多个目标对象的初始位置;
    将所述初始位置对应的候选图片输入第二检测模型,获得所述候选图片中的验证对象以及所述验证对象在所述候选图片中的验证位置;
    基于所述验证位置对所述一个或多个目标对象的初始位置进行调整,以获得所述一个或多个目标对象的目标位置;
    将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的一个或多个目标对象。
  2. 根据权利要求1所述的目标对象识别方法,所述将接收的初始图片输入第一检测模型之前,还包括:
    接收待处理视频,基于预设提取规则从所述待处理视频中提取i个视频帧作为初始图片,其中,i∈[1,n],且i为正整数。
  3. 根据权利要求2所述的目标对象识别方法,所述将接收的初始图片输入第一检测模型,获得所述初始图片中一个或多个目标对象的初始位置,包括:
    将接收的第i个初始图片输入第一检测模型,获得所述第i个初始图片中一个或多个目标对象的初始位置;
    相应的,所述将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的目标对象,包括:
    将所述目标位置对应的目标图片输入识别模型,获得所述第i个初始图片 中的一个或多个目标对象;
    判断i是否大于n,若是,则统计出每个初始图片中的一个或多个目标对象,
    若否,则将i自增1,继续执行将接收的第i个初始图片输入第一检测模型。
  4. 根据权利要求3所述的目标对象识别方法,所述将接收的第i个初始图片输入第一检测模型,获得所述第i个初始图片中一个或多个目标对象的初始位置,包括:
    将接收的第i个初始图片输入第一检测模型;
    判断所述第i个初始图片中是否包括目标对象,
    若是,则获得所述第i个初始图片中一个或多个目标对象的初始位置,
    若否,则将i自增1,继续执行将接收的第i个初始图片输入第一检测模型。
  5. 根据权利要求3或4所述的目标对象识别方法,所述将所述目标位置对应的目标图片输入识别模型,获得所述第i个初始图片中的一个或多个目标对象,包括:
    将所述目标位置对应的目标图片输入识别模型,若所述第i个初始图片的图片背景不满足预定条件,则将第i-1个初始图片中的一个或多个目标对象作为所述第i个初始图片中的一个或多个目标对象。
  6. 根据权利要求3或4所述的目标对象识别方法,所述统计出每个初始图片中的一个或多个目标对象之后,还包括:
    在所述第i个初始图片中的一个或多个目标对象不满足预设目标对象识别规则的情况下,将所述第i-1个初始图片中的一个或多个目标对象,作为所述第i个初始图片中的一个或多个目标对象。
  7. 根据权利要求3或4所述的目标对象识别方法,所述统计出每个初始图片中的一个或多个目标对象之后,还包括:
    按照预设时间段将所有初始图片中的一个或多个目标对象划分为至少一个对象序列;
    将每个对象序列中不满足预设目标对象排列规则的目标对象作为调整对象;
    基于所述调整对象对应的初始图片相邻的初始图片中的一个或多个目标对象,调整所述调整对象。
  8. 根据权利要求3或4所述的目标对象识别方法,所述目标对象包括第一目标对象和第二目标对象;
    相应的,所述统计出每个初始图片中的一个或多个目标对象之后,还包括:
    接收针对所述第一目标对象的获取请求,基于所述获取请求确定每个初始图片中的第一目标对象和第二目标对象,其中,所述第二目标对象与所述第一目标对象相关联;
    在所述每个初始图片中的第二目标对象根据所述第一目标对象的增加进行更新时,将所述每个初始图片中的第一目标对象进行展示。
  9. 根据权利要求1至8任意一项所述的目标对象识别方法,所述方法,还包括:
    对所述初始图片中的一个或多个目标对象进行提取,将目标对象满足预设提取条件的初始图片作为目标初始图片;
    基于所述目标初始图片生成目标视频,且将所述目标视频发送至用户。
  10. 一种目标对象识别装置,包括:
    初始位置确定模块,被配置为将接收的初始图片输入第一检测模型,获得所述初始图片中一个或多个目标对象的初始位置;
    验证位置确定模块,被配置为将所述初始位置对应的候选图片输入第二检测模型,获得所述候选图片中的验证对象以及所述验证对象在所述候选图片中的验证位置;
    目标位置确定模块,被配置为基于所述验证位置对所述一个或多个目标对象的初始位置进行调整,以获得所述一个或多个目标对象的目标位置;
    目标对象获得模块,被配置为将所述目标位置对应的目标图片输入识别模型,获得所述初始图片中的一个或多个目标对象。
  11. 一种计算设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机指令,所述处理器执行所述指令时实现权利要求1至9任意一项所述目标对象识别方法的步骤。
  12. 一种计算机可读存储介质,其存储有计算机指令,该指令被处理器执行时实现权利要求1至9任意一项所述目标对象识别方法的步骤。
  13. 一种计算机程序产品,当所述计算机程序产品在计算机中执行时,令计算机执行权利要求1至9任意一项所述方法的步骤。
PCT/CN2021/120387 2020-12-22 2021-09-24 目标对象识别方法及装置 WO2022134700A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21908701.2A EP4206978A4 (en) 2020-12-22 2021-09-24 METHOD AND APPARATUS FOR IDENTIFYING A TARGET OBJECT
US18/131,993 US20230281990A1 (en) 2020-12-22 2023-04-07 Target Object Recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011529196.5 2020-12-22
CN202011529196.5A CN112560728B (zh) 2020-12-22 2020-12-22 目标对象识别方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/131,993 Continuation US20230281990A1 (en) 2020-12-22 2023-04-07 Target Object Recognition

Publications (1)

Publication Number Publication Date
WO2022134700A1 true WO2022134700A1 (zh) 2022-06-30

Family

ID=75031319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/120387 WO2022134700A1 (zh) 2020-12-22 2021-09-24 目标对象识别方法及装置

Country Status (4)

Country Link
US (1) US20230281990A1 (zh)
EP (1) EP4206978A4 (zh)
CN (1) CN112560728B (zh)
WO (1) WO2022134700A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560728B (zh) * 2020-12-22 2023-07-11 上海幻电信息科技有限公司 目标对象识别方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678732A (zh) * 2014-11-18 2016-06-15 北京数码视讯科技股份有限公司 一种点球、角球和任意球关键帧的检测方法和装置
CN108769821A (zh) * 2018-05-25 2018-11-06 广州虎牙信息科技有限公司 游戏场景描述方法、装置、设备及存储介质
CN111405360A (zh) * 2020-03-25 2020-07-10 腾讯科技(深圳)有限公司 视频处理方法、装置、电子设备和存储介质
CN111414948A (zh) * 2020-03-13 2020-07-14 腾讯科技(深圳)有限公司 目标对象检测方法和相关装置
CN112560728A (zh) * 2020-12-22 2021-03-26 上海哔哩哔哩科技有限公司 目标对象识别方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106092114A (zh) * 2016-06-22 2016-11-09 江苏大学 一种图像识别的汽车实景导航装置及方法
JP6448674B2 (ja) * 2017-01-26 2019-01-09 キヤノン株式会社 文字認識可能な画像を撮影するためのガイド表示を行う、カメラ機能を有する携帯可能な情報処理装置、その表示制御方法、及びプログラム
CN109145784B (zh) * 2018-08-03 2022-06-03 百度在线网络技术(北京)有限公司 用于处理视频的方法和装置
CN109977941A (zh) * 2018-12-21 2019-07-05 北京融链科技有限公司 车牌识别方法及装置
CN110267116A (zh) * 2019-05-22 2019-09-20 北京奇艺世纪科技有限公司 视频生成方法、装置、电子设备和计算机可读介质
CN110781881A (zh) * 2019-09-10 2020-02-11 腾讯科技(深圳)有限公司 一种视频中的赛事比分识别方法、装置、设备及存储介质
CN111027563A (zh) * 2019-12-09 2020-04-17 腾讯云计算(北京)有限责任公司 一种文本检测方法、装置及识别系统
CN111553356B (zh) * 2020-05-26 2023-12-26 广东小天才科技有限公司 字符识别方法及装置、学习设备、计算机可读存储介质
CN111672123A (zh) * 2020-06-10 2020-09-18 腾讯科技(深圳)有限公司 虚拟操作对象的控制方法和装置、存储介质及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678732A (zh) * 2014-11-18 2016-06-15 北京数码视讯科技股份有限公司 一种点球、角球和任意球关键帧的检测方法和装置
CN108769821A (zh) * 2018-05-25 2018-11-06 广州虎牙信息科技有限公司 游戏场景描述方法、装置、设备及存储介质
CN111414948A (zh) * 2020-03-13 2020-07-14 腾讯科技(深圳)有限公司 目标对象检测方法和相关装置
CN111405360A (zh) * 2020-03-25 2020-07-10 腾讯科技(深圳)有限公司 视频处理方法、装置、电子设备和存储介质
CN112560728A (zh) * 2020-12-22 2021-03-26 上海哔哩哔哩科技有限公司 目标对象识别方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4206978A4 *

Also Published As

Publication number Publication date
EP4206978A4 (en) 2024-03-13
US20230281990A1 (en) 2023-09-07
EP4206978A1 (en) 2023-07-05
CN112560728B (zh) 2023-07-11
CN112560728A (zh) 2021-03-26

Similar Documents

Publication Publication Date Title
CN109284729B (zh) 基于视频获取人脸识别模型训练数据的方法、装置和介质
CN110147726B (zh) 业务质检方法和装置、存储介质及电子装置
US20230119593A1 (en) Method and apparatus for training facial feature extraction model, method and apparatus for extracting facial features, device, and storage medium
WO2021169161A1 (zh) 图像识别方法、识别模型的训练方法及相关装置、设备
US8194921B2 (en) Method, appartaus and computer program product for providing gesture analysis
WO2019119505A1 (zh) 人脸识别的方法和装置、计算机装置及存储介质
CN111026914B (zh) 视频摘要模型的训练方法、视频摘要生成方法及装置
CN114258559A (zh) 用于标识具有不受控制的光照条件的图像中的肤色的技术
CN108319888B (zh) 视频类型的识别方法及装置、计算机终端
WO2021159609A1 (zh) 一种视频卡顿识别方法、装置及终端设备
EP3893125A1 (en) Method and apparatus for searching video segment, device, medium and computer program product
CN107194158A (zh) 一种基于图像识别的疾病辅助诊断方法
US20210303864A1 (en) Method and apparatus for processing video, electronic device, medium and product
KR102002024B1 (ko) 객체 라벨링 처리 방법 및 객체 관리 서버
EP4207772A1 (en) Video processing method and apparatus
CN112101329B (zh) 一种基于视频的文本识别方法、模型训练的方法及装置
CN111814620A (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
CN105183758A (zh) 一种连续记录视频、影像的内容识别方法
US20210201090A1 (en) Method and apparatus for image processing and image classification
WO2022166532A1 (zh) 人脸识别方法、装置、电子设备及存储介质
WO2021139316A1 (zh) 建立表情识别模型方法、装置、计算机设备及存储介质
CA3052846A1 (en) Character recognition method, device, electronic device and storage medium
US11893773B2 (en) Finger vein comparison method, computer equipment, and storage medium
WO2023197648A1 (zh) 截图处理方法及装置、电子设备和计算机可读介质
WO2022134700A1 (zh) 目标对象识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21908701

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021908701

Country of ref document: EP

Effective date: 20230331

NENP Non-entry into the national phase

Ref country code: DE