人工智能的乐趣:使用MediaPipe和OpenCV在屏幕上“神奇地”创建图形

磐创AI 2022-05-17

opencvtip

1283 字丨阅读本文需 4 分钟

作为一名软件工程师,大多数时候觉得,我们是真正的魔术师,通过将来自不同来源的不同代码片段拼接在一起,使应用程序能够工作。

那时,我们可以浏览 Paul McWhorter 关于“MediaPipe”的视频教程。在人工智能方面,他是最好的老师之一。

媒体管道

MediaPipe 为直播和流媒体 提供开源跨平台、可定制的 ML 解决方案。在上述视频中,他演示了如何使用“MediaPipe Hands”跟踪手部和手指的运动。它使用机器学习 (ML) 从单帧中推断出一只手的 21 个 3D 地标。

想法

扩展这项工作,让圆形和矩形“神奇地”出现在屏幕上。准确地说,当双手出现在相机前时,食指尖周围会出现圆圈。把手拉近,圆圈互相接触,然后BINGO!合并成为一个圆圈。如果我们继续将手拉得更近,圆圈将变为矩形。

如果你觉得有趣,请继续阅读!

步骤

根据食指尖之间的距离绘制图形。

步骤是:

1. 使用 MediaPipe 找到双手和所有手指。

2. 获取双手食指尖(地标 8)的 x & y 坐标。

3. 计算这两个指尖之间的欧几里得距离。

· 如果距离大于预设半径(r)的两倍,则以指尖为圆心,半径为r画圆。

· 如果距离在半径的两倍和要出现的矩形的预设值之间,绘制一个包围食指尖的圆圈。

· 如果距离小于矩形点,则以食指尖为对角线绘制一个矩形。

4. 使用 OpenCV 绘制这些图形。

编码

该程序的主要库是 MediaPipe、OpenCV 和 NumPy。使用命令pip install安装那些库。强烈建议使用虚拟环境。

完整的代码可以在这个 GitHub 页面上找到:

"""

A fun project to make circles & rectangle 'magically' appear on the screen

Platform: Windows 10

Python Version: 3.10+

Major libraries: MediaPipe, OpenCV, NumPy

"""

import cv2

import numpy as np

import math

# Camera settings

DEFAULT_CAM = 0 # Built-in camera

USB_CAM = 1 # External camera connected via USB port

CAM_SELECTED = DEFAULT_CAM

CAM_WIDTH = 1280

CAM_HEIGHT = 720

CAM_FPS = 30

FLIP_CAMERA_FRAME_HORIZONTALLY = True

# mediapipe parameters

MAX_HANDS = 2

DETECTION_CONF = 0.5

TRACKING_CONF = 0.5

MODEL_COMPLEX = 1

HAND_1 = 0

HAND_2 = 1

INDEX_FINGER_TIP = 8

X_COORD = 0

Y_COORD = 1

FIGURES_LIST = ["Circles", "MergedCircle", "Rectangle"]

# Drawing parameters - for opencv

CIR_RADIUS = 200

CIR_COLOR = (255, 0, 0)

CIR_THICKNESS = 3

MERG_CIR_COLOR = (0, 255, 0)

MERG_CIR_THICKNESS = 3

RECT_POINT = 300

RECT_COLOR = (0, 0, 255)

RECT_THICKNESS = 3

class MpHands:

  import mediapipe as mp

  def __init__(self, max_hands=MAX_HANDS, det_conf=DETECTION_CONF, complexity=MODEL_COMPLEX, track_conf=TRACKING_CONF):

      """

      Inputs:-

      static_image_mode: Mode of input. If set to False, the solution treats the input images as a video stream.

      max_num_hands: Maximum number of hands to detect. Default to 2

      model_complexity: Complexity of the hand landmark model. 0 or 1.

                        Landmark accuracy as well as inference latency generally go up with the model complexity.

                        Default to 1.

      min_detection_confidence: Minimum confidence value ([0.0, 1.0]) from the hand detection model for the

                                detection to be considered successful. Default to 0.5.

      min_tracking_confidence: Minimum confidence value ([0.0, 1.0]) from the landmark-tracking model for the

                               hand landmarks to be considered tracked successfully.

                               Ignored if static_image_mode is True. Default to 0.5.

      Output:-

      multi_hand_landmarks: Collection of detected/tracked hands, where each hand is represented as a

                            list of 21 hand landmarks and each landmark is composed of x, y and z.

                            x and y are normalized to [0.0, 1.0] by the image width and height respectively.

      """

      self.hands = self.mp.solutions.hands.Hands(static_image_mode=False, max_num_hands=max_hands,

                                                 model_complexity=complexity, min_detection_confidence=det_conf,

                                                 min_tracking_confidence=track_conf)

  def marks(self, video_frame):

      """

      Aim: To get the X & Y coordinates of all the 21 landmarks of both hands

      :param video_frame: captured frame from the opencv. This is in BGR format.

      :return my_hands: Array of hands with 21 landmarks (X & Y) of each hand

                        [[(h1_x0,h1_y0), (h1_x1,h1_y1), ...(h1_x20,h1_y20)],

                        [(h2_x0,h2_y0), (h2_x1,h2_y1), ...(h2_x20,h2_y20)], ...]

      """

      my_hands = []

      frame_rgb = cv2.cvtColor(video_frame, cv2.COLOR_BGR2RGB)  # opencv works in BGR, while rest of the world in RGB

      multi_hand_landmarks = self.hands.process(frame_rgb).multi_hand_landmarks

      if multi_hand_landmarks:    # Do the following if we have detected/tracked hands

          # multi_hand_landmarks is an array of arrays. Each array contains the 21 landmarks (in dict) of each hand

          for hand_landmarks in multi_hand_landmarks: # Stepping through each hand

              my_hand = []

              for land_mark in hand_landmarks.landmark:  # Stepping through the 21 landmarks of each hand

                  # landmark is a dict with x,y & z coordinates. We are interested in x & y only.

                  # Since x & y are normalized, multiply them with camera width and height to get the actual values.

                  # Finally, convert the coordinates into integers for opencv

                  my_hand.append((int(land_mark.x * CAM_WIDTH), int(land_mark.y * CAM_HEIGHT)))

              my_hands.append(my_hand)

      return my_hands

def calc_euclidean_dist(p1, p2):

  """

  Aim: Get the shortest distance between two points (Euclidean distance).

  :param p1: point 1 with (x1, y1) coordinates

  :param p2: point 2 with (x2, y2) coordinates

  :return euc_dist: Euclidean distance

  """

  (p1_x, p1_y) = p1

  (p2_x, p2_y) = p2

  euc_dist = math.sqrt((p2_x - p1_x) ** 2 + (p2_y - p1_y) ** 2)

  return euc_dist

def select_figure(dist):

  """

  Aim: Select the figure to draw based on the distance

  :param dist: Distance between the index fingertips

  :return fig: selected figure

  """

 fig = FIGURES_LIST[0]

  if dist > CIR_RADIUS * 2:

      fig = FIGURES_LIST[0]

  if CIR_RADIUS * 2 >= dist > RECT_POINT:

      fig = FIGURES_LIST[1]

  if dist <= RECT_POINT:

      fig = FIGURES_LIST[2]

  return fig

# Camera configurations.

# Except 'CAM_SELECTED', all other settings are optional for faster launch of webcam in Windows

cam = cv2.VideoCapture(CAM_SELECTED, cv2.CAP_DSHOW) # CAP_DSHOW enables direct show without buffering

cam.set(cv2.CAP_PROP_FRAME_WIDTH, CAM_WIDTH)    # Set width of the frame

cam.set(cv2.CAP_PROP_FRAME_HEIGHT, CAM_HEIGHT)  # Set height of the frame

cam.set(cv2.CAP_PROP_FPS, CAM_FPS)  # Set fps of the camera

cam.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*'MJPG')) # Set the codec as 'MJPG'
findHands = MpHands()   # Create object

print("Press 'q' to quit")

while True:

  ignore, frame = cam.read()  # Read the frame from camera

  if FLIP_CAMERA_FRAME_HORIZONTALLY: # MediaPipe assumes the input image is mirrored. Flip it, if we want

      frame = cv2.flip(frame, 1)

  handData = findHands.marks(frame)   # Get the locations of both hands & fingers

  handDataLength = len(handData)  # Get the number of hands in the frame

  if handDataLength == 2: # We proceed only if there are two hands

      # The handData consists of 21 landmarks of each hand. We are interested in the tip of index fingers only.

      # Calculate the Euclidean distance between the index fingertips.

      indexTipsDist = calc_euclidean_dist(handData[HAND_1][INDEX_FINGER_TIP], handData[HAND_2][INDEX_FINGER_TIP])

      figure = select_figure(indexTipsDist)   # Based on the distance, select the figure to appear on screen

      match figure:

          case "Circles":

             for hand in handData:   # Draw circles with index fingertips as centers

                  circleCenter = hand[INDEX_FINGER_TIP]

                  cv2.circle(frame, circleCenter, CIR_RADIUS, CIR_COLOR, CIR_THICKNESS)

          case "MergedCircle":

              # Draw a circle which encloses our fingertips at min level,

              # so that our fingertips will be on the edge of the circle

              point1 = handData[HAND_1][INDEX_FINGER_TIP]

              point2 = handData[HAND_2][INDEX_FINGER_TIP]

             (x, y), radius = cv2.minEnclosingCircle(np.array([point1, point2])) # points should be passed as a single numpy array

              mergedCircleCenter = (int(x), int(y))   # opencv wants integer values

              mergedCircleRadius = int(radius)

              cv2.circle(frame, mergedCircleCenter, mergedCircleRadius, MERG_CIR_COLOR, MERG_CIR_THICKNESS)

          case "Rectangle":   # Draw a rectangle with our index fingertips as diagonally opposite edges.

              point1 = (handData[HAND_1][INDEX_FINGER_TIP][X_COORD], handData[HAND_1][INDEX_FINGER_TIP][Y_COORD])

              point2 = (handData[HAND_2][INDEX_FINGER_TIP][X_COORD], handData[HAND_2][INDEX_FINGER_TIP][Y_COORD])

             cv2.rectangle(frame, point1, point2, RECT_COLOR, RECT_THICKNESS)

  else:

      print("Show both hands for the magic to happen or Press 'q' to quit")

  cv2.imshow('Magic Frame', frame)    # Display the frame

  cv2.moveWindow('Magic Frame', 0, 0) # Move the frame to the top-left corner of the monitor

  if cv2.waitKey(1) & 0xff == ord('q'):   # wait for letter 'q' to exit.

      print("Quiting program")

      break

cam.release()   # release the camera

cv2.destroyAllWindows() # close all frame windows

在代码中使用了注释来表达意图,相信这是不言自明的。因此,不在这里解释代码, 以避免这成为一个冗长的帖子。PS:已经上传了 Python 3.9 或更低版本的兼容代码演示这是工作的演示。


免责声明:凡注明来源本网的所有作品,均为本网合法拥有版权或有权使用的作品,欢迎转载,注明出处本网。非本网作品均来自其他媒体,转载目的在于传递更多信息,并不代表本网赞同其观点和对其真实性负责。如您发现有任何侵权内容,请依照下方联系方式进行沟通,我们将第一时间进行处理。

0赞 好资讯,需要你的鼓励
来自:磐创AI
0

参与评论

登录后参与讨论 0/1000

为你推荐

加载中...