在實時音視頻領域，如何基於 TensorFlow 實現圖像識別

摘要：classobjectDetectThread(QThread): objectSignal = pyqtSignal(str) def__init__(self): super().__init__() def run(self): detection_graph =EventHandlerData.detection_graph withdetection_graph.as_default(): withtf.Session(graph=detection_graph)assess: (im_width, im_height) =EventHandlerData.image.size image_np = np.array(EventHandlerData.image.getdata()).reshape((im_height, im_width,3)).astype(np.uint8) image_np_expanded = np.expand_dims(image_np,axis=0) image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') boxes = detection_graph.get_tensor_by_name('detection_boxes:0') scores = detection_graph.get_tensor_by_name('detection_scores:0') classes = detection_graph.get_tensor_by_name('detection_classes:0') num_detections = detection_graph.get_tensor_by_name('num_detections:0') (boxes, scores, classes, num_detections) = sess.run( [boxes,scores,classes,num_detections], feed_dict={image_tensor: image_np_expanded}) objectText =[] # 如果識別概率大於百分之四十，我們就在文本框內顯示所識別物體 for i, cinenumerate(classes[0]): ifscores[0][i]>0.4 object =EventHandlerData.category_index[int(c)]['name'] ifobject notinobjectText: objectText.append(object) else: break self.objectSignal.emit(', '.join(objectText)) EventHandlerData.detectReady = True # 本幀圖片識別完，isImageDetect 字段置爲 True，再次開始讀取並轉換 Agora 遠端實時音視頻 EventHandlerData.isImageDetect = True。我們利用 Agora Python SDK 來完成音視頻的編解碼、降噪、回聲消除、低延時傳輸等任務，並將視頻圖像以 RGB 格式傳輸給 TensorFlow，然後我們通過 TensorFlow 來進行圖像識別，將識別結果返回給客戶端。

近兩年來，Python 在衆多編程語言中的熱度一直穩居前五，熱門程度可見一斑。 Python 擁有很活躍的社區和豐富的第三方庫，Web 框架、爬蟲框架、數據分析框架、機器學習框架等，開發者無需重複造輪子，可以用 Python 進行 Web 編程、網絡編程，開發多媒體應用，進行數據分析，或實現圖像識別等應用。其中圖像識別是最熱門的應用場景之一，也是與實時音視頻契合度最高的應用場景之一。

本文將分享 TensorFlow 圖像識別的實現。然後，我們嘗試將 TensorFlow 與 Agora Python SDK 結合，來實現在實時音視頻通話下的圖像識別。我們利用 Agora Python SDK 來完成音視頻的編解碼、降噪、回聲消除、低延時傳輸等任務，並將視頻圖像以 RGB 格式傳輸給 TensorFlow，然後我們通過 TensorFlow 來進行圖像識別，將識別結果返回給客戶端。

先分享一下 Demo 的識別效果。左圖是對端的視頻圖像，右下角是我們本地的視頻圖像，在識別對端圖像後，右側的文本框中會顯示出識別結果。注意，這個貼紙只是後期 P 上去的，我們在 Demo 中還沒有加入貼圖的功能，如果你感興趣，可以試着在 Demo 基礎上做改進，豐富功能。

首先，我們還是要先介紹一下 TensorFlow 的圖像識別原理與方法。

TensorFlow 圖片及物體識別

TensorFlow 是 Google 的開源深度學習庫，你可以使用這個框架以及 Python 編程語言，構建大量基於機器學習的應用程序。而且還有很多人把 TensorFlow 構建的應用程序或者其他框架，開源發佈到 GitHub 上。所以我們今天主要基於 Tensorflow 學習下物體識別。

TensorFlow 提供了用於檢測圖片或視頻中所包含物體的 API，詳情點擊此處。

物體檢測是檢測圖片中所出現的全部物體並且用矩形（Anchor Box）進行標註，物體的類別可以包括多種，例如人、車、動物、路標等。舉個例子瞭解 TensorFlow 物體檢測 API 的使用方法，這裏使用預訓練好的 ssd_mobilenet_v1_coco 模型（Single Shot MultiBox Detector），更多可用的物體檢測模型可以點擊這裏。

加載庫

 複製代碼

# -*- coding:
utf-8-*-

importnumpyas
np
import
tensorflowastf
import
matplotlib.pyplotasplt
fromPILimport
Image

fromutils
importlabel_map_util
fromutils
importvisualization_utilsasvis_util

定義部分常量

 複製代碼

PATH_TO_CKPT='ssd_mobilenet_v1_coco_2017_11_17/frozen_inference_graph.pb'
PATH_TO_LABELS='ssd_mobilenet_v1_coco_2017_11_17/mscoco_label_map.pbtxt'
NUM_CLASSES=90

加載預訓練好的模型

 複製代碼

detection_graph = tf.Graph()
withdetection_graph.as_default():
od_graph_def = tf.GraphDef()
withtf.gfile.GFile(PATH_TO_CKPT, 'rb')asfid:
od_graph_def.ParseFromString(fid.read())
tf.import_graph_def(od_graph_def,name='')

加載分類標籤數據

 複製代碼

label_map= label_map_util.load_labelmap(PATH_TO_LABELS)
categories= label_map_util.convert_label_map_to_categories(label_map,max_num_classes=NUM_CLASSES, use_display_name=True)
category_index= label_map_util.create_category_index(categories)

將圖片轉化爲數組，並測試圖片路徑

 複製代碼

def load_image_into_numpy_array(image):
(im_width, im_height) =image.size
returnnp.array(image.getdata()).reshape((im_height, im_width,3)).astype(np.uint8)

TEST_IMAGE_PATHS = ['test_images/image1.jpg','test_images/image2.jpg']

使用模型進行物體檢測

 複製代碼

withdetection_graph.as_default():
withtf.Session(graph=detection_graph)assess:
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
for image_pathinTEST_IMAGE_PATHS:
image =Image.open(image_path)
image_np = load_image_into_numpy_array(image)
image_np_expanded = np.expand_dims(image_np,axis=0)
(boxes, scores, classes, num) = sess.run(
[detection_boxes,detection_scores,detection_classes,num_detections],
feed_dict={image_tensor: image_np_expanded})

vis_util.visualize_boxes_and_labels_on_image_array(image_np,np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=8)
plt.figure(figsize=[12,8])
plt.imshow(image_np)
plt.show()

檢測結果如下，第一張圖片檢測出了兩隻狗狗

實時音視頻場景下 Tensorflow 物體識別

既然 Tensorflow 在靜態圖片的物體識別已經相對成熟，那在現實場景中，大量的實時音視頻互動場景中，如何來做物體識別？我們現在基於聲網實時視頻的 SDK，闡述如何做物體識別。

首先我們瞭解視頻其實就是由一幀一幀的圖像組合而成，所以從這個層面來說，視頻中的目標識別就是從每一幀圖像中做目標識別，從這個層面上講，二者沒有本質區別。在理解這個前提的基礎上，我們就可以相對簡單地做實時音視頻場景下 Tensorflow 物體識別。

1）讀取 Agora 實時音視頻，截取遠端視頻流的圖片

 複製代碼

defonRenderVideoFrame(uid, width, height, yStride,
uStride, vStride, yBuffer, uBuffer, vBuffer,
rotation, renderTimeMs, avsync_type):
# 用 isImageDetect 字段判斷前一幀圖像是否已完成識別，若完成置爲True, 執行以下代碼，執行完置爲 false
ifEventHandlerData.isImageDetect:
y_array = (ctypes.c_uint8 * (width * height)).from_address(yBuffer)
u_array = (ctypes.c_uint8 * ((width // 2) * (height // 2))).from_address(uBuffer)
v_array = (ctypes.c_uint8 * ((width // 2) * (height // 2))).from_address(vBuffer)

Y= np.frombuffer(y_array, dtype=np.uint8).reshape(height, width)
U= np.frombuffer(u_array, dtype=np.uint8).reshape((height // 2, width // 2)).repeat(2, axis=0).repeat(2, axis=1)
V= np.frombuffer(v_array, dtype=np.uint8).reshape((height // 2, width // 2)).repeat(2, axis=0).repeat(2, axis=1)
YUV= np.dstack((Y,U,V))[:height, :width, :]
#AI模型中大多數模型都是RGB格式訓練，聲網提供的視頻回調數據源是YUV格式，我們做下格式轉換
RGB= cv2.cvtColor(YUV, cv2.COLOR_YUV2RGB,3)
EventHandlerData.image =Image.fromarray(RGB)
EventHandlerData.isImageDetect =False

2）Tensorflow 對截取圖片進行物體識別

 複製代碼

classobjectDetectThread(QThread):
objectSignal = pyqtSignal(str)
def__init__(self):
super().__init__()
def run(self):
detection_graph =EventHandlerData.detection_graph
withdetection_graph.as_default():
withtf.Session(graph=detection_graph)assess:
(im_width, im_height) =EventHandlerData.image.size
image_np = np.array(EventHandlerData.image.getdata()).reshape((im_height, im_width,3)).astype(np.uint8)
image_np_expanded = np.expand_dims(image_np,axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
(boxes, scores, classes, num_detections) = sess.run(
[boxes,scores,classes,num_detections],
feed_dict={image_tensor: image_np_expanded})
objectText =[]
# 如果識別概率大於百分之四十，我們就在文本框內顯示所識別物體
for i, cinenumerate(classes[0]):
ifscores[0][i]>0.4
object =EventHandlerData.category_index[int(c)]['name']
ifobject notinobjectText:
objectText.append(object)
else:
break
self.objectSignal.emit(', '.join(objectText))
EventHandlerData.detectReady = True
# 本幀圖片識別完，isImageDetect 字段置爲 True，再次開始讀取並轉換 Agora 遠端實時音視頻
EventHandlerData.isImageDetect = True

我們已經將這個 Demo 以及 Agora Python SDK 上傳至 Github，大家可以直接下載使用： Agora Python TensorFlow Demo

Agora Python TensorFlow Demo 編譯指南

點擊下載 Agora Python SDK
若是 Windows，複製.pyd and .dll 文件到本項目文件夾根目錄；若是 IOS，複製.so 文件到本文件夾根目錄
下載 Tensorflow 模型, 然後把 object_detection 文件複製. 到本文件夾根目錄
安裝 Protobuf。然後運行：

 複製代碼

protoc object_detection/protos/*.proto--python_out=.

點擊下載預先訓練的模型
推薦使用 ssd_mobilenet_v1_coco
和 ssdlite_mobilenet_v2_coco，因爲他們相對運行較快
提取 frozen graph, 命令行運行：

 複製代碼

python extractGraph.py--model_file='FILE_NAME_OF_YOUR_MODEL'

最後，在 callBack.py 中修改 model name，在 demo.py 中修改 Appid，然後運行即可

請注意，這個 Demo 僅作爲演示使用，從獲取到遠端實時視頻畫面，到 TensorFlow 進行識別處理，再到顯示出識別效果，期間需要 2 至 4 秒。不同網絡情況、設備性能、算法模型，其識別的效率也不同。感興趣的開發者可以嘗試更換自己的算法模型，來優化識別的延時。

如果 Demo 運行中遇到問題，請在 Github 直接提 issue。

作者介紹

金葉清，6 年研發運營經驗，從事開發者社區領域工作近五年。曾負責華爲開發者社區相關運營工作；目前就職於聲網，從事開發者佈道師工作。

在實時音視頻領域，如何基於 TensorFlow 實現圖像識別

TensorFlow 圖片及物體識別

加載庫

定義部分常量

加載預訓練好的模型

加載分類標籤數據

將圖片轉化爲數組，並測試圖片路徑

使用模型進行物體檢測

實時音視頻場景下 Tensorflow 物體識別

Agora Python TensorFlow Demo 編譯指南

作者介紹

熱門新聞

週熱門

在實時音視頻領域，如何基於 TensorFlow 實現圖像識別

TensorFlow 圖片及物體識別

加載庫

定義部分常量

加載預訓練好的模型

加載分類標籤數據

將圖片轉化爲數組，並測試圖片路徑

使用模型進行物體檢測

實時音視頻場景下 Tensorflow 物體識別

Agora Python TensorFlow Demo 編譯指南

作者介紹

vSphere Bitfusion運行TensorFlow深度學習任務

TensorFlow驚現大bug？網友：這是逼着我們用PyTorch啊！

隔離宅在家，我自己做了個社交距離檢測器

無人駕駛新模範：AI 畫地圖無人車看

沒有硬件，也可以運行與測試 TFLite 應用

TensorFlow.js 爲何引入 WASM 後端

簡化 TensorFlow 和 Spark 互操作性的問題：LinkedIn 開源 Spark-TFRecord

TF1 到 TF2, 你的在線推理很可能內存爆炸

TensorFlow全球下載量破1億，Jeff Dean激動不已，但網友卻不給面子

使用Tensorflow從0開始搭建精靈寶可夢的檢測APP

有沒有什麼高效「煉丹」神器可以推薦？復旦fastNLP團隊祭出內部調參利器fitlog

捷安高科(300845.SZ)：目前已經在部分產品中應用了計算機圖像識別、語音識別技術、動作識別技術等相關技術滿足客戶需求

谷歌提出基於神經網絡搜索的目標檢測新型架構，同時適應圖像識別+定位

“你是什麼垃圾？” 這個問題他們幫你回答

AI也有偏見，我們該如何信任它們？

熱門新聞

週熱門