作为一名软件工程师,大多数时候觉得,我们是真正的魔术师,通过将来自不同来源的不同代码片段拼接在一起,使应用程序能够工作。
那时,我们可以浏览 Paul McWhorter 关于“MediaPipe”的视频教程。在人工智能方面,他是最好的老师之一。
媒体管道
MediaPipe 为直播和流媒体 提供开源跨平台、可定制的 ML 解决方案。在上述视频中,他演示了如何使用“MediaPipe Hands”跟踪手部和手指的运动。它使用机器学习 (ML) 从单帧中推断出一只手的 21 个 3D 地标。
想法
扩展这项工作,让圆形和矩形“神奇地”出现在屏幕上。准确地说,当双手出现在相机前时,食指尖周围会出现圆圈。把手拉近,圆圈互相接触,然后BINGO!合并成为一个圆圈。如果我们继续将手拉得更近,圆圈将变为矩形。
如果你觉得有趣,请继续阅读!
步骤
根据食指尖之间的距离绘制图形。
步骤是:
1. 使用 MediaPipe 找到双手和所有手指。
2. 获取双手食指尖(地标 8)的 x & y 坐标。
3. 计算这两个指尖之间的欧几里得距离。
· 如果距离大于预设半径(r)的两倍,则以指尖为圆心,半径为r画圆。
· 如果距离在半径的两倍和要出现的矩形的预设值之间,绘制一个包围食指尖的圆圈。
· 如果距离小于矩形点,则以食指尖为对角线绘制一个矩形。
4. 使用 OpenCV 绘制这些图形。
编码
该程序的主要库是 MediaPipe、OpenCV 和 NumPy。使用命令pip install安装那些库。强烈建议使用虚拟环境。
完整的代码可以在这个 GitHub 页面上找到:
"""
A fun project to make circles & rectangle 'magically' appear on the screen
Platform: Windows 10
Python Version: 3.10+
Major libraries: MediaPipe, OpenCV, NumPy
"""
import cv2
import numpy as np
import math
# Camera settings
DEFAULT_CAM = 0 # Built-in camera
USB_CAM = 1 # External camera connected via USB port
CAM_SELECTED = DEFAULT_CAM
CAM_WIDTH = 1280
CAM_HEIGHT = 720
CAM_FPS = 30
FLIP_CAMERA_FRAME_HORIZONTALLY = True
# mediapipe parameters
MAX_HANDS = 2
DETECTION_CONF = 0.5
TRACKING_CONF = 0.5
MODEL_COMPLEX = 1
HAND_1 = 0
HAND_2 = 1
INDEX_FINGER_TIP = 8
X_COORD = 0
Y_COORD = 1
FIGURES_LIST = ["Circles", "MergedCircle", "Rectangle"]
# Drawing parameters - for opencv
CIR_RADIUS = 200
CIR_COLOR = (255, 0, 0)
CIR_THICKNESS = 3
MERG_CIR_COLOR = (0, 255, 0)
MERG_CIR_THICKNESS = 3
RECT_POINT = 300
RECT_COLOR = (0, 0, 255)
RECT_THICKNESS = 3
class MpHands:
import mediapipe as mp
def __init__(self, max_hands=MAX_HANDS, det_conf=DETECTION_CONF, complexity=MODEL_COMPLEX, track_conf=TRACKING_CONF):
"""
Inputs:-
static_image_mode: Mode of input. If set to False, the solution treats the input images as a video stream.
max_num_hands: Maximum number of hands to detect. Default to 2
model_complexity: Complexity of the hand landmark model. 0 or 1.
Landmark accuracy as well as inference latency generally go up with the model complexity.
Default to 1.
min_detection_confidence: Minimum confidence value ([0.0, 1.0]) from the hand detection model for the
detection to be considered successful. Default to 0.5.
min_tracking_confidence: Minimum confidence value ([0.0, 1.0]) from the landmark-tracking model for the
hand landmarks to be considered tracked successfully.
Ignored if static_image_mode is True. Default to 0.5.
Output:-
multi_hand_landmarks: Collection of detected/tracked hands, where each hand is represented as a
list of 21 hand landmarks and each landmark is composed of x, y and z.
x and y are normalized to [0.0, 1.0] by the image width and height respectively.
"""
self.hands = self.mp.solutions.hands.Hands(static_image_mode=False, max_num_hands=max_hands,
model_complexity=complexity, min_detection_confidence=det_conf,
min_tracking_confidence=track_conf)
def marks(self, video_frame):
"""
Aim: To get the X & Y coordinates of all the 21 landmarks of both hands
:param video_frame: captured frame from the opencv. This is in BGR format.
:return my_hands: Array of hands with 21 landmarks (X & Y) of each hand
[[(h1_x0,h1_y0), (h1_x1,h1_y1), ...(h1_x20,h1_y20)],
[(h2_x0,h2_y0), (h2_x1,h2_y1), ...(h2_x20,h2_y20)], ...]
"""
my_hands = []
frame_rgb = cv2.cvtColor(video_frame, cv2.COLOR_BGR2RGB) # opencv works in BGR, while rest of the world in RGB
multi_hand_landmarks = self.hands.process(frame_rgb).multi_hand_landmarks
if multi_hand_landmarks: # Do the following if we have detected/tracked hands
# multi_hand_landmarks is an array of arrays. Each array contains the 21 landmarks (in dict) of each hand
for hand_landmarks in multi_hand_landmarks: # Stepping through each hand
my_hand = []
for land_mark in hand_landmarks.landmark: # Stepping through the 21 landmarks of each hand
# landmark is a dict with x,y & z coordinates. We are interested in x & y only.
# Since x & y are normalized, multiply them with camera width and height to get the actual values.
# Finally, convert the coordinates into integers for opencv
my_hand.append((int(land_mark.x * CAM_WIDTH), int(land_mark.y * CAM_HEIGHT)))
my_hands.append(my_hand)
return my_hands
def calc_euclidean_dist(p1, p2):
"""
Aim: Get the shortest distance between two points (Euclidean distance).
:param p1: point 1 with (x1, y1) coordinates
:param p2: point 2 with (x2, y2) coordinates
:return euc_dist: Euclidean distance
"""
(p1_x, p1_y) = p1
(p2_x, p2_y) = p2
euc_dist = math.sqrt((p2_x - p1_x) ** 2 + (p2_y - p1_y) ** 2)
return euc_dist
def select_figure(dist):
"""
Aim: Select the figure to draw based on the distance
:param dist: Distance between the index fingertips
:return fig: selected figure
"""
fig = FIGURES_LIST[0]
if dist > CIR_RADIUS * 2:
fig = FIGURES_LIST[0]
if CIR_RADIUS * 2 >= dist > RECT_POINT:
fig = FIGURES_LIST[1]
if dist <= RECT_POINT:
fig = FIGURES_LIST[2]
return fig
# Camera configurations.
# Except 'CAM_SELECTED', all other settings are optional for faster launch of webcam in Windows
cam = cv2.VideoCapture(CAM_SELECTED, cv2.CAP_DSHOW) # CAP_DSHOW enables direct show without buffering
cam.set(cv2.CAP_PROP_FRAME_WIDTH, CAM_WIDTH) # Set width of the frame
cam.set(cv2.CAP_PROP_FRAME_HEIGHT, CAM_HEIGHT) # Set height of the frame
cam.set(cv2.CAP_PROP_FPS, CAM_FPS) # Set fps of the camera
cam.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*'MJPG')) # Set the codec as 'MJPG'
findHands = MpHands() # Create object
print("Press 'q' to quit")
while True:
ignore, frame = cam.read() # Read the frame from camera
if FLIP_CAMERA_FRAME_HORIZONTALLY: # MediaPipe assumes the input image is mirrored. Flip it, if we want
frame = cv2.flip(frame, 1)
handData = findHands.marks(frame) # Get the locations of both hands & fingers
handDataLength = len(handData) # Get the number of hands in the frame
if handDataLength == 2: # We proceed only if there are two hands
# The handData consists of 21 landmarks of each hand. We are interested in the tip of index fingers only.
# Calculate the Euclidean distance between the index fingertips.
indexTipsDist = calc_euclidean_dist(handData[HAND_1][INDEX_FINGER_TIP], handData[HAND_2][INDEX_FINGER_TIP])
figure = select_figure(indexTipsDist) # Based on the distance, select the figure to appear on screen
match figure:
case "Circles":
for hand in handData: # Draw circles with index fingertips as centers
circleCenter = hand[INDEX_FINGER_TIP]
cv2.circle(frame, circleCenter, CIR_RADIUS, CIR_COLOR, CIR_THICKNESS)
case "MergedCircle":
# Draw a circle which encloses our fingertips at min level,
# so that our fingertips will be on the edge of the circle
point1 = handData[HAND_1][INDEX_FINGER_TIP]
point2 = handData[HAND_2][INDEX_FINGER_TIP]
(x, y), radius = cv2.minEnclosingCircle(np.array([point1, point2])) # points should be passed as a single numpy array
mergedCircleCenter = (int(x), int(y)) # opencv wants integer values
mergedCircleRadius = int(radius)
cv2.circle(frame, mergedCircleCenter, mergedCircleRadius, MERG_CIR_COLOR, MERG_CIR_THICKNESS)
case "Rectangle": # Draw a rectangle with our index fingertips as diagonally opposite edges.
point1 = (handData[HAND_1][INDEX_FINGER_TIP][X_COORD], handData[HAND_1][INDEX_FINGER_TIP][Y_COORD])
point2 = (handData[HAND_2][INDEX_FINGER_TIP][X_COORD], handData[HAND_2][INDEX_FINGER_TIP][Y_COORD])
cv2.rectangle(frame, point1, point2, RECT_COLOR, RECT_THICKNESS)
else:
print("Show both hands for the magic to happen or Press 'q' to quit")
cv2.imshow('Magic Frame', frame) # Display the frame
cv2.moveWindow('Magic Frame', 0, 0) # Move the frame to the top-left corner of the monitor
if cv2.waitKey(1) & 0xff == ord('q'): # wait for letter 'q' to exit.
print("Quiting program")
break
cam.release() # release the camera
cv2.destroyAllWindows() # close all frame windows
在代码中使用了注释来表达意图,相信这是不言自明的。因此,不在这里解释代码, 以避免这成为一个冗长的帖子。PS:已经上传了 Python 3.9 或更低版本的兼容代码演示这是工作的演示。
参与评论
登录后参与讨论 0/1000