| @@ -0,0 +1,137 @@ | |||||
| <div align=center> | |||||
| <img src="http://affluent.oss-cn-hangzhou.aliyuncs.com/html/images/dface_logo.png" width="350"> | |||||
| </div> | |||||
| ----------------- | |||||
| # DFace • [](https://opensource.org/licenses/Apache-2.0) [](https://gitter.im/cmusatyalab/DFace) | |||||
| | **`Linux CPU`** | **`Linux GPU`** | **`Mac OS CPU`** | **`Windows CPU`** | | |||||
| |-----------------|---------------------|------------------|-------------------| | |||||
| | [](http://pic.dface.io/pass.svg) | [](http://pic.dface.io/pass.svg) | [](http://pic.dface.io/pass.svg) | [](http://pic.dface.io/pass.svg) | | |||||
| **基于多任务卷积网络(MTCNN)和Center-Loss的多人实时人脸检测和人脸识别系统。** | |||||
| **DFace** 是个开源的深度学习人脸检测和人脸识别系统。所有功能都采用 **[pytorch](https://github.com/pytorch/pytorch)** 框架开发。pytorch是一个由facebook开发的深度学习框架,它包含了一些比较有趣的高级特性,例如自动求导,动态构图等。DFace天然的继承了这些优点,使得它的训练过程可以更加简单方便,并且实现的代码可以更加清晰易懂。 | |||||
| DFace可以利用CUDA来支持GPU加速模式。我们建议尝试linux GPU这种模式,它几乎可以实现实时的效果。 | |||||
| 所有的灵感都来源于学术界最近的一些研究成果,例如 [Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks](https://arxiv.org/abs/1604.02878) 和 [FaceNet: A Unified Embedding for Face Recognition and Clustering](https://arxiv.org/abs/1503.03832) | |||||
| **MTCNN 结构** | |||||
|  | |||||
| **如果你对DFace感兴趣并且想参与到这个项目中, 请查看目录下的 CONTRIBUTING.md 文档,它会实时展示一些需要@TODO的清单。我会用 [issues](https://github.com/DFace/DFace/issues) | |||||
| 来跟踪和反馈所有的问题.** | |||||
| ## 安装 | |||||
| DFace主要有两大模块,人脸检测和人脸识别。我会提供所有模型训练和运行的详细步骤。你首先需要构建一个pytorch和cv2的python环境,我推荐使用Anaconda来设置一个独立的虚拟环境。 | |||||
| ### 依赖 | |||||
| * cuda 8.0 | |||||
| * anaconda | |||||
| * pytorch | |||||
| * torchvision | |||||
| * cv2 | |||||
| * matplotlib | |||||
| 在这里我提供了一个anaconda的环境依赖文件environment.yml,它能方便你构建自己的虚拟环境。 | |||||
| ```shell | |||||
| conda env create -f path/to/environment.yml | |||||
| ``` | |||||
| ### 人脸检测 | |||||
| 如果你对mtcnn模型感兴趣,以下过程可能会帮助到你。 | |||||
| #### 训练mtcnn模型 | |||||
| MTCNN主要有三个网络,叫做**PNet**, **RNet** 和 **ONet**。因此我们的训练过程也需要分三步先后进行。为了更好的实现效果,当前被训练的网络都将依赖于上一个训练好的网络来生成数据。所有的人脸数据集都来自 **[WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/)** 和 **[CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)**。WIDER FACE仅提供了大量的人脸边框定位数据,而CelebA包含了人脸关键点定位数据。 | |||||
| * 生成PNet训练数据和标注文件 | |||||
| ```shell | |||||
| python src/prepare_data/gen_Pnet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} | |||||
| ``` | |||||
| * 乱序合并标注文件 | |||||
| ```shell | |||||
| python src/prepare_data/assemble_pnet_imglist.py | |||||
| ``` | |||||
| * 训练PNet模型 | |||||
| ```shell | |||||
| python src/train_net/train_p_net.py | |||||
| ``` | |||||
| * 生成RNet训练数据和标注文件 | |||||
| ```shell | |||||
| python src/prepare_data/gen_Rnet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} --pmodel_file {yout PNet model file trained before} | |||||
| ``` | |||||
| * 乱序合并标注文件 | |||||
| ```shell | |||||
| python src/prepare_data/assemble_rnet_imglist.py | |||||
| ``` | |||||
| * 训练RNet模型 | |||||
| ```shell | |||||
| python src/train_net/train_r_net.py | |||||
| ``` | |||||
| * 生成ONet训练数据和标注文件 | |||||
| ```shell | |||||
| python src/prepare_data/gen_Onet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} --pmodel_file {yout PNet model file trained before} --rmodel_file {yout RNet model file trained before} | |||||
| ``` | |||||
| * 生成ONet的人脸关键点训练数据和标注文件 | |||||
| ```shell | |||||
| python src/prepare_data/gen_landmark_48.py | |||||
| ``` | |||||
| * 乱序合并标注文件(包括人脸关键点) | |||||
| ```shell | |||||
| python src/prepare_data/assemble_onet_imglist.py | |||||
| ``` | |||||
| * 训练ONet模型 | |||||
| ```shell | |||||
| python src/train_net/train_o_net.py | |||||
| ``` | |||||
| #### 测试人脸检测 | |||||
| ```shell | |||||
| python test_image.py | |||||
| ``` | |||||
| ### 人脸识别 | |||||
| TODO | |||||
| ## 测试效果 | |||||
|  | |||||
| ## License | |||||
| [Apache License 2.0](LICENSE) | |||||
| ## Reference | |||||
| * [Seanlinx/mtcnn](https://github.com/Seanlinx/mtcnn) | |||||
| @@ -0,0 +1,141 @@ | |||||
| <div align=center> | |||||
| <a href="http://dface.io" target="_blank"><img src="http://pic.dface.io/dfacelogoblue.png" width="350"></a> | |||||
| </div> | |||||
| ----------------- | |||||
| # DFace • [](https://opensource.org/licenses/Apache-2.0) [](https://gitter.im/cmusatyalab/DFace) | |||||
| | **`Linux CPU`** | **`Linux GPU`** | **`Mac OS CPU`** | **`Windows CPU`** | | |||||
| |-----------------|---------------------|------------------|-------------------| | |||||
| | [](http://pic.dface.io/pass.svg) | [](http://pic.dface.io/pass.svg) | [](http://pic.dface.io/pass.svg) | [](http://pic.dface.io/pass.svg) | | |||||
| **Free and open source face detection and recognition with | |||||
| deep learning. Based on the MTCNN and ResNet Center-Loss** | |||||
| [中文版 README](https://github.com/kuaikuaikim/DFace/blob/master/README_zh.md) | |||||
| **DFace** is an open source software for face detection and recognition. All features implemented by the **[pytorch](https://github.com/pytorch/pytorch)** (the facebook deeplearning framework). With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows developer to change the way your network behaves arbitrarily with zero lag or overhead. | |||||
| DFace inherit these advanced characteristic, that make it dynamic and ease code review. | |||||
| DFace support GPU acceleration with NVIDIA cuda. We highly recommend you use the linux GPU version.It's very fast and extremely realtime. | |||||
| Our inspiration comes from several research papers on this topic, as well as current and past work such as [Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks](https://arxiv.org/abs/1604.02878) and face recognition topic [FaceNet: A Unified Embedding for Face Recognition and Clustering](https://arxiv.org/abs/1503.03832) | |||||
| **MTCNN Structure** | |||||
|  | |||||
| **If you want to contribute to DFace, please review the CONTRIBUTING.md in the project.We use [GitHub issues](https://github.com/DFace/DFace/issues) for | |||||
| tracking requests and bugs.** | |||||
| ## Installation | |||||
| DFace has two major module, detection and recognition.In these two, We provide all tutorials about how to train a model and running. | |||||
| First setting a pytorch and cv2. We suggest Anaconda to make a virtual and independent python envirment. | |||||
| ### Requirements | |||||
| * cuda 8.0 | |||||
| * anaconda | |||||
| * pytorch | |||||
| * torchvision | |||||
| * cv2 | |||||
| * matplotlib | |||||
| Also we provide a anaconda environment dependency list called environment.yml in the root path. | |||||
| You can create your DFace environment very easily. | |||||
| ```shell | |||||
| conda env create -f path/to/environment.yml | |||||
| ``` | |||||
| ### Face Detetion | |||||
| If you are interested in how to train a mtcnn model, you can follow next step. | |||||
| #### Train mtcnn Model | |||||
| MTCNN have three networks called **PNet**, **RNet** and **ONet**.So we should train it on three stage, and each stage depend on previous network which will generate train data to feed current train net, also propel the minimum loss between two networks. | |||||
| Please download the train face **datasets** before your training. We use **[WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/)** and **[CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)** | |||||
| * Generate PNet Train data and annotation file | |||||
| ```shell | |||||
| python src/prepare_data/gen_Pnet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} | |||||
| ``` | |||||
| * Assemble annotation file and shuffle it | |||||
| ```shell | |||||
| python src/prepare_data/assemble_pnet_imglist.py | |||||
| ``` | |||||
| * Train PNet model | |||||
| ```shell | |||||
| python src/train_net/train_p_net.py | |||||
| ``` | |||||
| * Generate RNet Train data and annotation file | |||||
| ```shell | |||||
| python src/prepare_data/gen_Rnet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} --pmodel_file {yout PNet model file trained before} | |||||
| ``` | |||||
| * Assemble annotation file and shuffle it | |||||
| ```shell | |||||
| python src/prepare_data/assemble_rnet_imglist.py | |||||
| ``` | |||||
| * Train RNet model | |||||
| ```shell | |||||
| python src/train_net/train_r_net.py | |||||
| ``` | |||||
| * Generate ONet Train data and annotation file | |||||
| ```shell | |||||
| python src/prepare_data/gen_Onet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} --pmodel_file {yout PNet model file trained before} --rmodel_file {yout RNet model file trained before} | |||||
| ``` | |||||
| * Generate ONet Train landmarks data | |||||
| ```shell | |||||
| python src/prepare_data/gen_landmark_48.py | |||||
| ``` | |||||
| * Assemble annotation file and shuffle it | |||||
| ```shell | |||||
| python src/prepare_data/assemble_onet_imglist.py | |||||
| ``` | |||||
| * Train ONet model | |||||
| ```shell | |||||
| python src/train_net/train_o_net.py | |||||
| ``` | |||||
| #### Test face detection | |||||
| ```shell | |||||
| python test_image.py | |||||
| ``` | |||||
| ### Face Recognition | |||||
| TODO | |||||
| ## Demo | |||||
|  | |||||
| ## License | |||||
| [Apache License 2.0](LICENSE) | |||||
| ## Reference | |||||
| * [Seanlinx/mtcnn](https://github.com/Seanlinx/mtcnn) | |||||
| @@ -0,0 +1 @@ | |||||
| This directory store the annotation files of train data | |||||
| @@ -0,0 +1,66 @@ | |||||
| name: pytorch | |||||
| channels: | |||||
| - soumith | |||||
| - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free | |||||
| - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ | |||||
| - defaults | |||||
| dependencies: | |||||
| - cairo=1.14.8=0 | |||||
| - certifi=2016.2.28=py27_0 | |||||
| - cffi=1.10.0=py27_0 | |||||
| - fontconfig=2.12.1=3 | |||||
| - freetype=2.5.5=2 | |||||
| - glib=2.50.2=1 | |||||
| - harfbuzz=0.9.39=2 | |||||
| - hdf5=1.8.17=2 | |||||
| - jbig=2.1=0 | |||||
| - jpeg=8d=2 | |||||
| - libffi=3.2.1=1 | |||||
| - libgcc=5.2.0=0 | |||||
| - libiconv=1.14=0 | |||||
| - libpng=1.6.30=1 | |||||
| - libtiff=4.0.6=2 | |||||
| - libxml2=2.9.4=0 | |||||
| - mkl=2017.0.3=0 | |||||
| - numpy=1.12.1=py27_0 | |||||
| - olefile=0.44=py27_0 | |||||
| - opencv=3.1.0=np112py27_1 | |||||
| - openssl=1.0.2l=0 | |||||
| - pcre=8.39=1 | |||||
| - pillow=3.4.2=py27_0 | |||||
| - pip=9.0.1=py27_1 | |||||
| - pixman=0.34.0=0 | |||||
| - pycparser=2.18=py27_0 | |||||
| - python=2.7.13=0 | |||||
| - readline=6.2=2 | |||||
| - setuptools=36.4.0=py27_1 | |||||
| - six=1.10.0=py27_0 | |||||
| - sqlite=3.13.0=0 | |||||
| - tk=8.5.18=0 | |||||
| - wheel=0.29.0=py27_0 | |||||
| - xz=5.2.3=0 | |||||
| - zlib=1.2.11=0 | |||||
| - cycler=0.10.0=py27_0 | |||||
| - dbus=1.10.20=0 | |||||
| - expat=2.1.0=0 | |||||
| - functools32=3.2.3.2=py27_0 | |||||
| - gst-plugins-base=1.8.0=0 | |||||
| - gstreamer=1.8.0=0 | |||||
| - icu=54.1=0 | |||||
| - libxcb=1.12=1 | |||||
| - matplotlib=2.0.2=np112py27_0 | |||||
| - pycairo=1.10.0=py27_0 | |||||
| - pyparsing=2.2.0=py27_0 | |||||
| - pyqt=5.6.0=py27_2 | |||||
| - python-dateutil=2.6.1=py27_0 | |||||
| - pytz=2017.2=py27_0 | |||||
| - qt=5.6.2=2 | |||||
| - sip=4.18=py27_0 | |||||
| - subprocess32=3.2.7=py27_0 | |||||
| - cuda80=1.0=0 | |||||
| - pytorch=0.2.0=py27hc03bea1_4cu80 | |||||
| - torchvision=0.1.9=py27hdb88a65_1 | |||||
| - pip: | |||||
| - torch==0.2.0.post4 | |||||
| prefix: /home/asy/.conda/envs/pytorch | |||||
| @@ -0,0 +1 @@ | |||||
| log dir | |||||
| @@ -0,0 +1 @@ | |||||
| This directory store trained model net parameters and structure | |||||
| @@ -0,0 +1,42 @@ | |||||
| import os | |||||
| MODEL_STORE_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))+"/model_store" | |||||
| ANNO_STORE_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))+"/anno_store" | |||||
| LOG_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))+"/log" | |||||
| USE_CUDA = True | |||||
| TRAIN_BATCH_SIZE = 512 | |||||
| TRAIN_LR = 0.01 | |||||
| END_EPOCH = 10 | |||||
| PNET_POSTIVE_ANNO_FILENAME = "pos_12.txt" | |||||
| PNET_NEGATIVE_ANNO_FILENAME = "neg_12.txt" | |||||
| PNET_PART_ANNO_FILENAME = "part_12.txt" | |||||
| PNET_LANDMARK_ANNO_FILENAME = "landmark_12.txt" | |||||
| RNET_POSTIVE_ANNO_FILENAME = "pos_24.txt" | |||||
| RNET_NEGATIVE_ANNO_FILENAME = "neg_24.txt" | |||||
| RNET_PART_ANNO_FILENAME = "part_24.txt" | |||||
| RNET_LANDMARK_ANNO_FILENAME = "landmark_24.txt" | |||||
| ONET_POSTIVE_ANNO_FILENAME = "pos_48.txt" | |||||
| ONET_NEGATIVE_ANNO_FILENAME = "neg_48.txt" | |||||
| ONET_PART_ANNO_FILENAME = "part_48.txt" | |||||
| ONET_LANDMARK_ANNO_FILENAME = "landmark_48.txt" | |||||
| PNET_TRAIN_IMGLIST_FILENAME = "imglist_anno_12.txt" | |||||
| RNET_TRAIN_IMGLIST_FILENAME = "imglist_anno_24.txt" | |||||
| ONET_TRAIN_IMGLIST_FILENAME = "imglist_anno_48.txt" | |||||
| @@ -0,0 +1,632 @@ | |||||
| import cv2 | |||||
| import time | |||||
| import numpy as np | |||||
| import torch | |||||
| from torch.autograd.variable import Variable | |||||
| from models import PNet,RNet,ONet | |||||
| import utils as utils | |||||
| import image_tools | |||||
| def create_mtcnn_net(p_model_path=None, r_model_path=None, o_model_path=None, use_cuda=True): | |||||
| pnet, rnet, onet = None, None, None | |||||
| if p_model_path is not None: | |||||
| pnet = PNet(use_cuda=use_cuda) | |||||
| pnet.load_state_dict(torch.load(p_model_path)) | |||||
| if(use_cuda): | |||||
| pnet.cuda() | |||||
| pnet.eval() | |||||
| if r_model_path is not None: | |||||
| rnet = RNet(use_cuda=use_cuda) | |||||
| rnet.load_state_dict(torch.load(r_model_path)) | |||||
| if (use_cuda): | |||||
| rnet.cuda() | |||||
| rnet.eval() | |||||
| if o_model_path is not None: | |||||
| onet = ONet(use_cuda=use_cuda) | |||||
| onet.load_state_dict(torch.load(o_model_path)) | |||||
| if (use_cuda): | |||||
| onet.cuda() | |||||
| onet.eval() | |||||
| return pnet,rnet,onet | |||||
| class MtcnnDetector(object): | |||||
| """ | |||||
| P,R,O net face detection and landmarks align | |||||
| """ | |||||
| def __init__(self, | |||||
| pnet = None, | |||||
| rnet = None, | |||||
| onet = None, | |||||
| min_face_size=12, | |||||
| stride=2, | |||||
| threshold=[0.6, 0.7, 0.7], | |||||
| scale_factor=0.709, | |||||
| ): | |||||
| self.pnet_detector = pnet | |||||
| self.rnet_detector = rnet | |||||
| self.onet_detector = onet | |||||
| self.min_face_size = min_face_size | |||||
| self.stride=stride | |||||
| self.thresh = threshold | |||||
| self.scale_factor = scale_factor | |||||
| def unique_image_format(self,im): | |||||
| if not isinstance(im,np.ndarray): | |||||
| if im.mode == 'I': | |||||
| im = np.array(im, np.int32, copy=False) | |||||
| elif im.mode == 'I;16': | |||||
| im = np.array(im, np.int16, copy=False) | |||||
| else: | |||||
| im = np.asarray(im) | |||||
| return im | |||||
| def square_bbox(self, bbox): | |||||
| """ | |||||
| convert bbox to square | |||||
| Parameters: | |||||
| ---------- | |||||
| bbox: numpy array , shape n x m | |||||
| input bbox | |||||
| Returns: | |||||
| ------- | |||||
| square bbox | |||||
| """ | |||||
| square_bbox = bbox.copy() | |||||
| h = bbox[:, 3] - bbox[:, 1] + 1 | |||||
| w = bbox[:, 2] - bbox[:, 0] + 1 | |||||
| l = np.maximum(h,w) | |||||
| square_bbox[:, 0] = bbox[:, 0] + w*0.5 - l*0.5 | |||||
| square_bbox[:, 1] = bbox[:, 1] + h*0.5 - l*0.5 | |||||
| square_bbox[:, 2] = square_bbox[:, 0] + l - 1 | |||||
| square_bbox[:, 3] = square_bbox[:, 1] + l - 1 | |||||
| return square_bbox | |||||
| def generate_bounding_box(self, map, reg, scale, threshold): | |||||
| """ | |||||
| generate bbox from feature map | |||||
| Parameters: | |||||
| ---------- | |||||
| map: numpy array , n x m x 1 | |||||
| detect score for each position | |||||
| reg: numpy array , n x m x 4 | |||||
| bbox | |||||
| scale: float number | |||||
| scale of this detection | |||||
| threshold: float number | |||||
| detect threshold | |||||
| Returns: | |||||
| ------- | |||||
| bbox array | |||||
| """ | |||||
| stride = 2 | |||||
| cellsize = 12 | |||||
| t_index = np.where(map > threshold) | |||||
| # find nothing | |||||
| if t_index[0].size == 0: | |||||
| return np.array([]) | |||||
| dx1, dy1, dx2, dy2 = [reg[0, t_index[0], t_index[1], i] for i in range(4)] | |||||
| reg = np.array([dx1, dy1, dx2, dy2]) | |||||
| # lefteye_dx, lefteye_dy, righteye_dx, righteye_dy, nose_dx, nose_dy, \ | |||||
| # leftmouth_dx, leftmouth_dy, rightmouth_dx, rightmouth_dy = [landmarks[0, t_index[0], t_index[1], i] for i in range(10)] | |||||
| # | |||||
| # landmarks = np.array([lefteye_dx, lefteye_dy, righteye_dx, righteye_dy, nose_dx, nose_dy, leftmouth_dx, leftmouth_dy, rightmouth_dx, rightmouth_dy]) | |||||
| score = map[t_index[0], t_index[1], 0] | |||||
| boundingbox = np.vstack([np.round((stride * t_index[1]) / scale), | |||||
| np.round((stride * t_index[0]) / scale), | |||||
| np.round((stride * t_index[1] + cellsize) / scale), | |||||
| np.round((stride * t_index[0] + cellsize) / scale), | |||||
| score, | |||||
| reg, | |||||
| # landmarks | |||||
| ]) | |||||
| return boundingbox.T | |||||
| def resize_image(self, img, scale): | |||||
| """ | |||||
| resize image and transform dimention to [batchsize, channel, height, width] | |||||
| Parameters: | |||||
| ---------- | |||||
| img: numpy array , height x width x channel | |||||
| input image, channels in BGR order here | |||||
| scale: float number | |||||
| scale factor of resize operation | |||||
| Returns: | |||||
| ------- | |||||
| transformed image tensor , 1 x channel x height x width | |||||
| """ | |||||
| height, width, channels = img.shape | |||||
| new_height = int(height * scale) # resized new height | |||||
| new_width = int(width * scale) # resized new width | |||||
| new_dim = (new_width, new_height) | |||||
| img_resized = cv2.resize(img, new_dim, interpolation=cv2.INTER_LINEAR) # resized image | |||||
| return img_resized | |||||
| def pad(self, bboxes, w, h): | |||||
| """ | |||||
| pad the the boxes | |||||
| Parameters: | |||||
| ---------- | |||||
| bboxes: numpy array, n x 5 | |||||
| input bboxes | |||||
| w: float number | |||||
| width of the input image | |||||
| h: float number | |||||
| height of the input image | |||||
| Returns : | |||||
| ------ | |||||
| dy, dx : numpy array, n x 1 | |||||
| start point of the bbox in target image | |||||
| edy, edx : numpy array, n x 1 | |||||
| end point of the bbox in target image | |||||
| y, x : numpy array, n x 1 | |||||
| start point of the bbox in original image | |||||
| ex, ex : numpy array, n x 1 | |||||
| end point of the bbox in original image | |||||
| tmph, tmpw: numpy array, n x 1 | |||||
| height and width of the bbox | |||||
| """ | |||||
| tmpw = (bboxes[:, 2] - bboxes[:, 0] + 1).astype(np.int32) | |||||
| tmph = (bboxes[:, 3] - bboxes[:, 1] + 1).astype(np.int32) | |||||
| numbox = bboxes.shape[0] | |||||
| dx = np.zeros((numbox, )) | |||||
| dy = np.zeros((numbox, )) | |||||
| edx, edy = tmpw.copy()-1, tmph.copy()-1 | |||||
| x, y, ex, ey = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2], bboxes[:, 3] | |||||
| tmp_index = np.where(ex > w-1) | |||||
| edx[tmp_index] = tmpw[tmp_index] + w - 2 - ex[tmp_index] | |||||
| ex[tmp_index] = w - 1 | |||||
| tmp_index = np.where(ey > h-1) | |||||
| edy[tmp_index] = tmph[tmp_index] + h - 2 - ey[tmp_index] | |||||
| ey[tmp_index] = h - 1 | |||||
| tmp_index = np.where(x < 0) | |||||
| dx[tmp_index] = 0 - x[tmp_index] | |||||
| x[tmp_index] = 0 | |||||
| tmp_index = np.where(y < 0) | |||||
| dy[tmp_index] = 0 - y[tmp_index] | |||||
| y[tmp_index] = 0 | |||||
| return_list = [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] | |||||
| return_list = [item.astype(np.int32) for item in return_list] | |||||
| return return_list | |||||
| def detect_pnet(self, im): | |||||
| """Get face candidates through pnet | |||||
| Parameters: | |||||
| ---------- | |||||
| im: numpy array | |||||
| input image array | |||||
| Returns: | |||||
| ------- | |||||
| boxes: numpy array | |||||
| detected boxes before calibration | |||||
| boxes_align: numpy array | |||||
| boxes after calibration | |||||
| """ | |||||
| # im = self.unique_image_format(im) | |||||
| h, w, c = im.shape | |||||
| net_size = 12 | |||||
| current_scale = float(net_size) / self.min_face_size # find initial scale | |||||
| im_resized = self.resize_image(im, current_scale) | |||||
| current_height, current_width, _ = im_resized.shape | |||||
| # fcn | |||||
| all_boxes = list() | |||||
| while min(current_height, current_width) > net_size: | |||||
| feed_imgs = [] | |||||
| image_tensor = image_tools.convert_image_to_tensor(im_resized) | |||||
| feed_imgs.append(image_tensor) | |||||
| feed_imgs = torch.stack(feed_imgs) | |||||
| feed_imgs = Variable(feed_imgs) | |||||
| if self.pnet_detector.use_cuda: | |||||
| feed_imgs = feed_imgs.cuda() | |||||
| cls_map, reg = self.pnet_detector(feed_imgs) | |||||
| cls_map_np = image_tools.convert_chwTensor_to_hwcNumpy(cls_map.cpu()) | |||||
| reg_np = image_tools.convert_chwTensor_to_hwcNumpy(reg.cpu()) | |||||
| # landmark_np = image_tools.convert_chwTensor_to_hwcNumpy(landmark.cpu()) | |||||
| boxes = self.generate_bounding_box(cls_map_np[ 0, :, :], reg_np, current_scale, self.thresh[0]) | |||||
| current_scale *= self.scale_factor | |||||
| im_resized = self.resize_image(im, current_scale) | |||||
| current_height, current_width, _ = im_resized.shape | |||||
| if boxes.size == 0: | |||||
| continue | |||||
| keep = utils.nms(boxes[:, :5], 0.5, 'Union') | |||||
| boxes = boxes[keep] | |||||
| all_boxes.append(boxes) | |||||
| if len(all_boxes) == 0: | |||||
| return None, None | |||||
| all_boxes = np.vstack(all_boxes) | |||||
| # merge the detection from first stage | |||||
| keep = utils.nms(all_boxes[:, 0:5], 0.7, 'Union') | |||||
| all_boxes = all_boxes[keep] | |||||
| # boxes = all_boxes[:, :5] | |||||
| bw = all_boxes[:, 2] - all_boxes[:, 0] + 1 | |||||
| bh = all_boxes[:, 3] - all_boxes[:, 1] + 1 | |||||
| # landmark_keep = all_boxes[:, 9:].reshape((5,2)) | |||||
| boxes = np.vstack([all_boxes[:,0], | |||||
| all_boxes[:,1], | |||||
| all_boxes[:,2], | |||||
| all_boxes[:,3], | |||||
| all_boxes[:,4], | |||||
| # all_boxes[:, 0] + all_boxes[:, 9] * bw, | |||||
| # all_boxes[:, 1] + all_boxes[:,10] * bh, | |||||
| # all_boxes[:, 0] + all_boxes[:, 11] * bw, | |||||
| # all_boxes[:, 1] + all_boxes[:, 12] * bh, | |||||
| # all_boxes[:, 0] + all_boxes[:, 13] * bw, | |||||
| # all_boxes[:, 1] + all_boxes[:, 14] * bh, | |||||
| # all_boxes[:, 0] + all_boxes[:, 15] * bw, | |||||
| # all_boxes[:, 1] + all_boxes[:, 16] * bh, | |||||
| # all_boxes[:, 0] + all_boxes[:, 17] * bw, | |||||
| # all_boxes[:, 1] + all_boxes[:, 18] * bh | |||||
| ]) | |||||
| boxes = boxes.T | |||||
| align_topx = all_boxes[:, 0] + all_boxes[:, 5] * bw | |||||
| align_topy = all_boxes[:, 1] + all_boxes[:, 6] * bh | |||||
| align_bottomx = all_boxes[:, 2] + all_boxes[:, 7] * bw | |||||
| align_bottomy = all_boxes[:, 3] + all_boxes[:, 8] * bh | |||||
| # refine the boxes | |||||
| boxes_align = np.vstack([ align_topx, | |||||
| align_topy, | |||||
| align_bottomx, | |||||
| align_bottomy, | |||||
| all_boxes[:, 4], | |||||
| # align_topx + all_boxes[:,9] * bw, | |||||
| # align_topy + all_boxes[:,10] * bh, | |||||
| # align_topx + all_boxes[:,11] * bw, | |||||
| # align_topy + all_boxes[:,12] * bh, | |||||
| # align_topx + all_boxes[:,13] * bw, | |||||
| # align_topy + all_boxes[:,14] * bh, | |||||
| # align_topx + all_boxes[:,15] * bw, | |||||
| # align_topy + all_boxes[:,16] * bh, | |||||
| # align_topx + all_boxes[:,17] * bw, | |||||
| # align_topy + all_boxes[:,18] * bh, | |||||
| ]) | |||||
| boxes_align = boxes_align.T | |||||
| return boxes, boxes_align | |||||
| def detect_rnet(self, im, dets): | |||||
| """Get face candidates using rnet | |||||
| Parameters: | |||||
| ---------- | |||||
| im: numpy array | |||||
| input image array | |||||
| dets: numpy array | |||||
| detection results of pnet | |||||
| Returns: | |||||
| ------- | |||||
| boxes: numpy array | |||||
| detected boxes before calibration | |||||
| boxes_align: numpy array | |||||
| boxes after calibration | |||||
| """ | |||||
| h, w, c = im.shape | |||||
| if dets is None: | |||||
| return None,None | |||||
| dets = self.square_bbox(dets) | |||||
| dets[:, 0:4] = np.round(dets[:, 0:4]) | |||||
| [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] = self.pad(dets, w, h) | |||||
| num_boxes = dets.shape[0] | |||||
| ''' | |||||
| # helper for setting RNet batch size | |||||
| batch_size = self.rnet_detector.batch_size | |||||
| ratio = float(num_boxes) / batch_size | |||||
| if ratio > 3 or ratio < 0.3: | |||||
| print "You may need to reset RNet batch size if this info appears frequently, \ | |||||
| face candidates:%d, current batch_size:%d"%(num_boxes, batch_size) | |||||
| ''' | |||||
| # cropped_ims_tensors = np.zeros((num_boxes, 3, 24, 24), dtype=np.float32) | |||||
| cropped_ims_tensors = [] | |||||
| for i in range(num_boxes): | |||||
| tmp = np.zeros((tmph[i], tmpw[i], 3), dtype=np.uint8) | |||||
| tmp[dy[i]:edy[i]+1, dx[i]:edx[i]+1, :] = im[y[i]:ey[i]+1, x[i]:ex[i]+1, :] | |||||
| crop_im = cv2.resize(tmp, (24, 24)) | |||||
| crop_im_tensor = image_tools.convert_image_to_tensor(crop_im) | |||||
| # cropped_ims_tensors[i, :, :, :] = crop_im_tensor | |||||
| cropped_ims_tensors.append(crop_im_tensor) | |||||
| feed_imgs = Variable(torch.stack(cropped_ims_tensors)) | |||||
| if self.rnet_detector.use_cuda: | |||||
| feed_imgs = feed_imgs.cuda() | |||||
| cls_map, reg = self.rnet_detector(feed_imgs) | |||||
| cls_map = cls_map.cpu().data.numpy() | |||||
| reg = reg.cpu().data.numpy() | |||||
| # landmark = landmark.cpu().data.numpy() | |||||
| keep_inds = np.where(cls_map > self.thresh[1])[0] | |||||
| if len(keep_inds) > 0: | |||||
| boxes = dets[keep_inds] | |||||
| cls = cls_map[keep_inds] | |||||
| reg = reg[keep_inds] | |||||
| # landmark = landmark[keep_inds] | |||||
| else: | |||||
| return None, None | |||||
| keep = utils.nms(boxes, 0.7) | |||||
| if len(keep) == 0: | |||||
| return None, None | |||||
| keep_cls = cls[keep] | |||||
| keep_boxes = boxes[keep] | |||||
| keep_reg = reg[keep] | |||||
| # keep_landmark = landmark[keep] | |||||
| bw = keep_boxes[:, 2] - keep_boxes[:, 0] + 1 | |||||
| bh = keep_boxes[:, 3] - keep_boxes[:, 1] + 1 | |||||
| boxes = np.vstack([ keep_boxes[:,0], | |||||
| keep_boxes[:,1], | |||||
| keep_boxes[:,2], | |||||
| keep_boxes[:,3], | |||||
| keep_cls[:,0], | |||||
| # keep_boxes[:,0] + keep_landmark[:, 0] * bw, | |||||
| # keep_boxes[:,1] + keep_landmark[:, 1] * bh, | |||||
| # keep_boxes[:,0] + keep_landmark[:, 2] * bw, | |||||
| # keep_boxes[:,1] + keep_landmark[:, 3] * bh, | |||||
| # keep_boxes[:,0] + keep_landmark[:, 4] * bw, | |||||
| # keep_boxes[:,1] + keep_landmark[:, 5] * bh, | |||||
| # keep_boxes[:,0] + keep_landmark[:, 6] * bw, | |||||
| # keep_boxes[:,1] + keep_landmark[:, 7] * bh, | |||||
| # keep_boxes[:,0] + keep_landmark[:, 8] * bw, | |||||
| # keep_boxes[:,1] + keep_landmark[:, 9] * bh, | |||||
| ]) | |||||
| align_topx = keep_boxes[:,0] + keep_reg[:,0] * bw | |||||
| align_topy = keep_boxes[:,1] + keep_reg[:,1] * bh | |||||
| align_bottomx = keep_boxes[:,2] + keep_reg[:,2] * bw | |||||
| align_bottomy = keep_boxes[:,3] + keep_reg[:,3] * bh | |||||
| boxes_align = np.vstack([align_topx, | |||||
| align_topy, | |||||
| align_bottomx, | |||||
| align_bottomy, | |||||
| keep_cls[:, 0], | |||||
| # align_topx + keep_landmark[:, 0] * bw, | |||||
| # align_topy + keep_landmark[:, 1] * bh, | |||||
| # align_topx + keep_landmark[:, 2] * bw, | |||||
| # align_topy + keep_landmark[:, 3] * bh, | |||||
| # align_topx + keep_landmark[:, 4] * bw, | |||||
| # align_topy + keep_landmark[:, 5] * bh, | |||||
| # align_topx + keep_landmark[:, 6] * bw, | |||||
| # align_topy + keep_landmark[:, 7] * bh, | |||||
| # align_topx + keep_landmark[:, 8] * bw, | |||||
| # align_topy + keep_landmark[:, 9] * bh, | |||||
| ]) | |||||
| boxes = boxes.T | |||||
| boxes_align = boxes_align.T | |||||
| return boxes, boxes_align | |||||
| def detect_onet(self, im, dets): | |||||
| """Get face candidates using onet | |||||
| Parameters: | |||||
| ---------- | |||||
| im: numpy array | |||||
| input image array | |||||
| dets: numpy array | |||||
| detection results of rnet | |||||
| Returns: | |||||
| ------- | |||||
| boxes_align: numpy array | |||||
| boxes after calibration | |||||
| landmarks_align: numpy array | |||||
| landmarks after calibration | |||||
| """ | |||||
| h, w, c = im.shape | |||||
| if dets is None: | |||||
| return None, None | |||||
| dets = self.square_bbox(dets) | |||||
| dets[:, 0:4] = np.round(dets[:, 0:4]) | |||||
| [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] = self.pad(dets, w, h) | |||||
| num_boxes = dets.shape[0] | |||||
| # cropped_ims_tensors = np.zeros((num_boxes, 3, 24, 24), dtype=np.float32) | |||||
| cropped_ims_tensors = [] | |||||
| for i in range(num_boxes): | |||||
| tmp = np.zeros((tmph[i], tmpw[i], 3), dtype=np.uint8) | |||||
| tmp[dy[i]:edy[i] + 1, dx[i]:edx[i] + 1, :] = im[y[i]:ey[i] + 1, x[i]:ex[i] + 1, :] | |||||
| crop_im = cv2.resize(tmp, (48, 48)) | |||||
| crop_im_tensor = image_tools.convert_image_to_tensor(crop_im) | |||||
| # cropped_ims_tensors[i, :, :, :] = crop_im_tensor | |||||
| cropped_ims_tensors.append(crop_im_tensor) | |||||
| feed_imgs = Variable(torch.stack(cropped_ims_tensors)) | |||||
| if self.rnet_detector.use_cuda: | |||||
| feed_imgs = feed_imgs.cuda() | |||||
| cls_map, reg, landmark = self.onet_detector(feed_imgs) | |||||
| cls_map = cls_map.cpu().data.numpy() | |||||
| reg = reg.cpu().data.numpy() | |||||
| landmark = landmark.cpu().data.numpy() | |||||
| keep_inds = np.where(cls_map > self.thresh[2])[0] | |||||
| if len(keep_inds) > 0: | |||||
| boxes = dets[keep_inds] | |||||
| cls = cls_map[keep_inds] | |||||
| reg = reg[keep_inds] | |||||
| landmark = landmark[keep_inds] | |||||
| else: | |||||
| return None, None | |||||
| keep = utils.nms(boxes, 0.7, mode="Minimum") | |||||
| if len(keep) == 0: | |||||
| return None, None | |||||
| keep_cls = cls[keep] | |||||
| keep_boxes = boxes[keep] | |||||
| keep_reg = reg[keep] | |||||
| keep_landmark = landmark[keep] | |||||
| bw = keep_boxes[:, 2] - keep_boxes[:, 0] + 1 | |||||
| bh = keep_boxes[:, 3] - keep_boxes[:, 1] + 1 | |||||
| align_topx = keep_boxes[:, 0] + keep_reg[:, 0] * bw | |||||
| align_topy = keep_boxes[:, 1] + keep_reg[:, 1] * bh | |||||
| align_bottomx = keep_boxes[:, 2] + keep_reg[:, 2] * bw | |||||
| align_bottomy = keep_boxes[:, 3] + keep_reg[:, 3] * bh | |||||
| align_landmark_topx = keep_boxes[:, 0] | |||||
| align_landmark_topy = keep_boxes[:, 1] | |||||
| boxes_align = np.vstack([align_topx, | |||||
| align_topy, | |||||
| align_bottomx, | |||||
| align_bottomy, | |||||
| keep_cls[:, 0], | |||||
| # align_topx + keep_landmark[:, 0] * bw, | |||||
| # align_topy + keep_landmark[:, 1] * bh, | |||||
| # align_topx + keep_landmark[:, 2] * bw, | |||||
| # align_topy + keep_landmark[:, 3] * bh, | |||||
| # align_topx + keep_landmark[:, 4] * bw, | |||||
| # align_topy + keep_landmark[:, 5] * bh, | |||||
| # align_topx + keep_landmark[:, 6] * bw, | |||||
| # align_topy + keep_landmark[:, 7] * bh, | |||||
| # align_topx + keep_landmark[:, 8] * bw, | |||||
| # align_topy + keep_landmark[:, 9] * bh, | |||||
| ]) | |||||
| boxes_align = boxes_align.T | |||||
| landmark = np.vstack([ | |||||
| align_landmark_topx + keep_landmark[:, 0] * bw, | |||||
| align_landmark_topy + keep_landmark[:, 1] * bh, | |||||
| align_landmark_topx + keep_landmark[:, 2] * bw, | |||||
| align_landmark_topy + keep_landmark[:, 3] * bh, | |||||
| align_landmark_topx + keep_landmark[:, 4] * bw, | |||||
| align_landmark_topy + keep_landmark[:, 5] * bh, | |||||
| align_landmark_topx + keep_landmark[:, 6] * bw, | |||||
| align_landmark_topy + keep_landmark[:, 7] * bh, | |||||
| align_landmark_topx + keep_landmark[:, 8] * bw, | |||||
| align_landmark_topy + keep_landmark[:, 9] * bh, | |||||
| ]) | |||||
| landmark_align = landmark.T | |||||
| return boxes_align, landmark_align | |||||
| def detect_face(self,img): | |||||
| """Detect face over image | |||||
| """ | |||||
| boxes_align = np.array([]) | |||||
| landmark_align =np.array([]) | |||||
| t = time.time() | |||||
| # pnet | |||||
| if self.pnet_detector: | |||||
| boxes, boxes_align = self.detect_pnet(img) | |||||
| if boxes_align is None: | |||||
| return np.array([]), np.array([]) | |||||
| t1 = time.time() - t | |||||
| t = time.time() | |||||
| # rnet | |||||
| if self.rnet_detector: | |||||
| boxes, boxes_align = self.detect_rnet(img, boxes_align) | |||||
| if boxes_align is None: | |||||
| return np.array([]), np.array([]) | |||||
| t2 = time.time() - t | |||||
| t = time.time() | |||||
| # onet | |||||
| if self.onet_detector: | |||||
| boxes_align, landmark_align = self.detect_onet(img, boxes_align) | |||||
| if boxes_align is None: | |||||
| return np.array([]), np.array([]) | |||||
| t3 = time.time() - t | |||||
| t = time.time() | |||||
| print "time cost " + '{:.3f}'.format(t1+t2+t3) + ' pnet {:.3f} rnet {:.3f} onet {:.3f}'.format(t1, t2, t3) | |||||
| return boxes_align, landmark_align | |||||
| @@ -0,0 +1,171 @@ | |||||
| import numpy as np | |||||
| import cv2 | |||||
| class TrainImageReader: | |||||
| def __init__(self, imdb, im_size, batch_size=128, shuffle=False): | |||||
| self.imdb = imdb | |||||
| self.batch_size = batch_size | |||||
| self.im_size = im_size | |||||
| self.shuffle = shuffle | |||||
| self.cur = 0 | |||||
| self.size = len(imdb) | |||||
| self.index = np.arange(self.size) | |||||
| self.num_classes = 2 | |||||
| self.batch = None | |||||
| self.data = None | |||||
| self.label = None | |||||
| self.label_names= ['label', 'bbox_target', 'landmark_target'] | |||||
| self.reset() | |||||
| self.get_batch() | |||||
| def reset(self): | |||||
| self.cur = 0 | |||||
| if self.shuffle: | |||||
| np.random.shuffle(self.index) | |||||
| def iter_next(self): | |||||
| return self.cur + self.batch_size <= self.size | |||||
| def __iter__(self): | |||||
| return self | |||||
| def __next__(self): | |||||
| return self.next() | |||||
| def next(self): | |||||
| if self.iter_next(): | |||||
| self.get_batch() | |||||
| self.cur += self.batch_size | |||||
| return self.data,self.label | |||||
| else: | |||||
| raise StopIteration | |||||
| def getindex(self): | |||||
| return self.cur / self.batch_size | |||||
| def getpad(self): | |||||
| if self.cur + self.batch_size > self.size: | |||||
| return self.cur + self.batch_size - self.size | |||||
| else: | |||||
| return 0 | |||||
| def get_batch(self): | |||||
| cur_from = self.cur | |||||
| cur_to = min(cur_from + self.batch_size, self.size) | |||||
| imdb = [self.imdb[self.index[i]] for i in range(cur_from, cur_to)] | |||||
| data, label = get_minibatch(imdb) | |||||
| self.data = data['data'] | |||||
| self.label = [label[name] for name in self.label_names] | |||||
| class TestImageLoader: | |||||
| def __init__(self, imdb, batch_size=1, shuffle=False): | |||||
| self.imdb = imdb | |||||
| self.batch_size = batch_size | |||||
| self.shuffle = shuffle | |||||
| self.size = len(imdb) | |||||
| self.index = np.arange(self.size) | |||||
| self.cur = 0 | |||||
| self.data = None | |||||
| self.label = None | |||||
| self.reset() | |||||
| self.get_batch() | |||||
| def reset(self): | |||||
| self.cur = 0 | |||||
| if self.shuffle: | |||||
| np.random.shuffle(self.index) | |||||
| def iter_next(self): | |||||
| return self.cur + self.batch_size <= self.size | |||||
| def __iter__(self): | |||||
| return self | |||||
| def __next__(self): | |||||
| return self.next() | |||||
| def next(self): | |||||
| if self.iter_next(): | |||||
| self.get_batch() | |||||
| self.cur += self.batch_size | |||||
| return self.data | |||||
| else: | |||||
| raise StopIteration | |||||
| def getindex(self): | |||||
| return self.cur / self.batch_size | |||||
| def getpad(self): | |||||
| if self.cur + self.batch_size > self.size: | |||||
| return self.cur + self.batch_size - self.size | |||||
| else: | |||||
| return 0 | |||||
| def get_batch(self): | |||||
| cur_from = self.cur | |||||
| cur_to = min(cur_from + self.batch_size, self.size) | |||||
| imdb = [self.imdb[self.index[i]] for i in range(cur_from, cur_to)] | |||||
| data= get_testbatch(imdb) | |||||
| self.data=data['data'] | |||||
| def get_minibatch(imdb): | |||||
| # im_size: 12, 24 or 48 | |||||
| num_images = len(imdb) | |||||
| processed_ims = list() | |||||
| cls_label = list() | |||||
| bbox_reg_target = list() | |||||
| landmark_reg_target = list() | |||||
| for i in range(num_images): | |||||
| im = cv2.imread(imdb[i]['image']) | |||||
| #im = Image.open(imdb[i]['image']) | |||||
| if imdb[i]['flipped']: | |||||
| im = im[:, ::-1, :] | |||||
| #im = im.transpose(Image.FLIP_LEFT_RIGHT) | |||||
| cls = imdb[i]['label'] | |||||
| bbox_target = imdb[i]['bbox_target'] | |||||
| landmark = imdb[i]['landmark_target'] | |||||
| processed_ims.append(im) | |||||
| cls_label.append(cls) | |||||
| bbox_reg_target.append(bbox_target) | |||||
| landmark_reg_target.append(landmark) | |||||
| im_array = np.asarray(processed_ims) | |||||
| label_array = np.array(cls_label) | |||||
| bbox_target_array = np.vstack(bbox_reg_target) | |||||
| landmark_target_array = np.vstack(landmark_reg_target) | |||||
| data = {'data': im_array} | |||||
| label = {'label': label_array, | |||||
| 'bbox_target': bbox_target_array, | |||||
| 'landmark_target': landmark_target_array | |||||
| } | |||||
| return data, label | |||||
| def get_testbatch(imdb): | |||||
| assert len(imdb) == 1, "Single batch only" | |||||
| im = cv2.imread(imdb[0]['image']) | |||||
| data = {'data': im} | |||||
| return data | |||||
| @@ -0,0 +1,40 @@ | |||||
| import torchvision.transforms as transforms | |||||
| import torch | |||||
| from torch.autograd.variable import Variable | |||||
| import numpy as np | |||||
| transform = transforms.ToTensor() | |||||
| def convert_image_to_tensor(image): | |||||
| """convert an image to pytorch tensor | |||||
| Parameters: | |||||
| ---------- | |||||
| image: numpy array , h * w * c | |||||
| Returns: | |||||
| ------- | |||||
| image_tensor: pytorch.FloatTensor, c * h * w | |||||
| """ | |||||
| image = image.astype(np.float) | |||||
| return transform(image) | |||||
| # return transform(image) | |||||
| def convert_chwTensor_to_hwcNumpy(tensor): | |||||
| """convert a group images pytorch tensor(count * c * h * w) to numpy array images(count * h * w * c) | |||||
| Parameters: | |||||
| ---------- | |||||
| tensor: numpy array , count * c * h * w | |||||
| Returns: | |||||
| ------- | |||||
| numpy array images: count * h * w * c | |||||
| """ | |||||
| if isinstance(tensor, Variable): | |||||
| return np.transpose(tensor.data.numpy(), (0,2,3,1)) | |||||
| elif isinstance(tensor, torch.FloatTensor): | |||||
| return np.transpose(tensor.numpy(), (0,2,3,1)) | |||||
| else: | |||||
| raise Exception("covert b*c*h*w tensor to b*h*w*c numpy error.This tensor must have 4 dimension.") | |||||
| @@ -0,0 +1,162 @@ | |||||
| import os | |||||
| import numpy as np | |||||
| class ImageDB(object): | |||||
| def __init__(self, image_annotation_file, prefix_path='', mode='train'): | |||||
| self.prefix_path = prefix_path | |||||
| self.image_annotation_file = image_annotation_file | |||||
| self.classes = ['__background__', 'face'] | |||||
| self.num_classes = 2 | |||||
| self.image_set_index = self.load_image_set_index() | |||||
| self.num_images = len(self.image_set_index) | |||||
| self.mode = mode | |||||
| def load_image_set_index(self): | |||||
| """Get image index | |||||
| Parameters: | |||||
| ---------- | |||||
| Returns: | |||||
| ------- | |||||
| image_set_index: str | |||||
| relative path of image | |||||
| """ | |||||
| assert os.path.exists(self.image_annotation_file), 'Path does not exist: {}'.format(self.image_annotation_file) | |||||
| with open(self.image_annotation_file, 'r') as f: | |||||
| image_set_index = [x.strip().split(' ')[0] for x in f.readlines()] | |||||
| return image_set_index | |||||
| def load_imdb(self): | |||||
| """Get and save ground truth image database | |||||
| Parameters: | |||||
| ---------- | |||||
| Returns: | |||||
| ------- | |||||
| gt_imdb: dict | |||||
| image database with annotations | |||||
| """ | |||||
| #cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl') | |||||
| #if os.path.exists(cache_file): | |||||
| # with open(cache_file, 'rb') as f: | |||||
| # imdb = cPickle.load(f) | |||||
| # print '{} gt imdb loaded from {}'.format(self.name, cache_file) | |||||
| # return imdb | |||||
| gt_imdb = self.load_annotations() | |||||
| #with open(cache_file, 'wb') as f: | |||||
| # cPickle.dump(gt_imdb, f, cPickle.HIGHEST_PROTOCOL) | |||||
| return gt_imdb | |||||
| def real_image_path(self, index): | |||||
| """Given image index, return full path | |||||
| Parameters: | |||||
| ---------- | |||||
| index: str | |||||
| relative path of image | |||||
| Returns: | |||||
| ------- | |||||
| image_file: str | |||||
| full path of image | |||||
| """ | |||||
| index = index.replace("\\", "/") | |||||
| if not os.path.exists(index): | |||||
| image_file = os.path.join(self.prefix_path, index) | |||||
| else: | |||||
| image_file=index | |||||
| if not image_file.endswith('.jpg'): | |||||
| image_file = image_file + '.jpg' | |||||
| assert os.path.exists(image_file), 'Path does not exist: {}'.format(image_file) | |||||
| return image_file | |||||
| def load_annotations(self,annotion_type=1): | |||||
| """Load annotations | |||||
| Parameters: | |||||
| ---------- | |||||
| annotion_type: int | |||||
| 0:dsadsa | |||||
| 1:dsadsa | |||||
| Returns: | |||||
| ------- | |||||
| imdb: dict | |||||
| image database with annotations | |||||
| """ | |||||
| assert os.path.exists(self.image_annotation_file), 'annotations not found at {}'.format(self.image_annotation_file) | |||||
| with open(self.image_annotation_file, 'r') as f: | |||||
| annotations = f.readlines() | |||||
| imdb = [] | |||||
| for i in range(self.num_images): | |||||
| annotation = annotations[i].strip().split(' ') | |||||
| index = annotation[0] | |||||
| im_path = self.real_image_path(index) | |||||
| imdb_ = dict() | |||||
| imdb_['image'] = im_path | |||||
| if self.mode == 'test': | |||||
| # gt_boxes = map(float, annotation[1:]) | |||||
| # boxes = np.array(bbox, dtype=np.float32).reshape(-1, 4) | |||||
| # imdb_['gt_boxes'] = boxes | |||||
| pass | |||||
| else: | |||||
| label = annotation[1] | |||||
| imdb_['label'] = int(label) | |||||
| imdb_['flipped'] = False | |||||
| imdb_['bbox_target'] = np.zeros((4,)) | |||||
| imdb_['landmark_target'] = np.zeros((10,)) | |||||
| if len(annotation[2:])==4: | |||||
| bbox_target = annotation[2:6] | |||||
| imdb_['bbox_target'] = np.array(bbox_target).astype(float) | |||||
| if len(annotation[2:])==14: | |||||
| bbox_target = annotation[2:6] | |||||
| imdb_['bbox_target'] = np.array(bbox_target).astype(float) | |||||
| landmark = annotation[6:] | |||||
| imdb_['landmark_target'] = np.array(landmark).astype(float) | |||||
| imdb.append(imdb_) | |||||
| return imdb | |||||
| def append_flipped_images(self, imdb): | |||||
| """append flipped images to imdb | |||||
| Parameters: | |||||
| ---------- | |||||
| imdb: imdb | |||||
| image database | |||||
| Returns: | |||||
| ------- | |||||
| imdb: dict | |||||
| image database with flipped image annotations added | |||||
| """ | |||||
| print 'append flipped images to imdb', len(imdb) | |||||
| for i in range(len(imdb)): | |||||
| imdb_ = imdb[i] | |||||
| m_bbox = imdb_['bbox_target'].copy() | |||||
| m_bbox[0], m_bbox[2] = -m_bbox[2], -m_bbox[0] | |||||
| landmark_ = imdb_['landmark_target'].copy() | |||||
| landmark_ = landmark_.reshape((5, 2)) | |||||
| landmark_ = np.asarray([(1 - x, y) for (x, y) in landmark_]) | |||||
| landmark_[[0, 1]] = landmark_[[1, 0]] | |||||
| landmark_[[3, 4]] = landmark_[[4, 3]] | |||||
| item = {'image': imdb_['image'], | |||||
| 'label': imdb_['label'], | |||||
| 'bbox_target': m_bbox, | |||||
| 'landmark_target': landmark_.reshape((10)), | |||||
| 'flipped': True} | |||||
| imdb.append(item) | |||||
| self.image_set_index *= 2 | |||||
| return imdb | |||||
| @@ -0,0 +1,207 @@ | |||||
| import torch | |||||
| import torch.nn as nn | |||||
| import torch.nn.functional as F | |||||
| def weights_init(m): | |||||
| if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear): | |||||
| nn.init.xavier_uniform(m.weight.data) | |||||
| nn.init.constant(m.bias, 0.1) | |||||
| class LossFn: | |||||
| def __init__(self, cls_factor=1, box_factor=1, landmark_factor=1): | |||||
| # loss function | |||||
| self.cls_factor = cls_factor | |||||
| self.box_factor = box_factor | |||||
| self.land_factor = landmark_factor | |||||
| self.loss_cls = nn.BCELoss() | |||||
| self.loss_box = nn.MSELoss() | |||||
| self.loss_landmark = nn.MSELoss() | |||||
| def cls_loss(self,gt_label,pred_label): | |||||
| pred_label = torch.squeeze(pred_label) | |||||
| gt_label = torch.squeeze(gt_label) | |||||
| # get the mask element which >= 0, only 0 and 1 can effect the detection loss | |||||
| mask = torch.ge(gt_label,0) | |||||
| valid_gt_label = torch.masked_select(gt_label,mask) | |||||
| valid_pred_label = torch.masked_select(pred_label,mask) | |||||
| return self.loss_cls(valid_pred_label,valid_gt_label)*self.cls_factor | |||||
| def box_loss(self,gt_label,gt_offset,pred_offset): | |||||
| pred_offset = torch.squeeze(pred_offset) | |||||
| gt_offset = torch.squeeze(gt_offset) | |||||
| gt_label = torch.squeeze(gt_label) | |||||
| #get the mask element which != 0 | |||||
| unmask = torch.eq(gt_label,0) | |||||
| mask = torch.eq(unmask,0) | |||||
| #convert mask to dim index | |||||
| chose_index = torch.nonzero(mask.data) | |||||
| chose_index = torch.squeeze(chose_index) | |||||
| #only valid element can effect the loss | |||||
| valid_gt_offset = gt_offset[chose_index,:] | |||||
| valid_pred_offset = pred_offset[chose_index,:] | |||||
| return self.loss_box(valid_pred_offset,valid_gt_offset)*self.box_factor | |||||
| def landmark_loss(self,gt_label,gt_landmark,pred_landmark): | |||||
| pred_landmark = torch.squeeze(pred_landmark) | |||||
| gt_landmark = torch.squeeze(gt_landmark) | |||||
| gt_label = torch.squeeze(gt_label) | |||||
| mask = torch.eq(gt_label,-2) | |||||
| chose_index = torch.nonzero(mask.data) | |||||
| chose_index = torch.squeeze(chose_index) | |||||
| valid_gt_landmark = gt_landmark[chose_index, :] | |||||
| valid_pred_landmark = pred_landmark[chose_index, :] | |||||
| return self.loss_landmark(valid_pred_landmark,valid_gt_landmark)*self.land_factor | |||||
| class PNet(nn.Module): | |||||
| ''' PNet ''' | |||||
| def __init__(self, is_train=False, use_cuda=True): | |||||
| super(PNet, self).__init__() | |||||
| self.is_train = is_train | |||||
| self.use_cuda = use_cuda | |||||
| # backend | |||||
| self.pre_layer = nn.Sequential( | |||||
| nn.Conv2d(3, 10, kernel_size=3, stride=1), # conv1 | |||||
| nn.PReLU(), # PReLU1 | |||||
| nn.MaxPool2d(kernel_size=2, stride=2), # pool1 | |||||
| nn.Conv2d(10, 16, kernel_size=3, stride=1), # conv2 | |||||
| nn.PReLU(), # PReLU2 | |||||
| nn.Conv2d(16, 32, kernel_size=3, stride=1), # conv3 | |||||
| nn.PReLU() # PReLU3 | |||||
| ) | |||||
| # detection | |||||
| self.conv4_1 = nn.Conv2d(32, 1, kernel_size=1, stride=1) | |||||
| # bounding box regresion | |||||
| self.conv4_2 = nn.Conv2d(32, 4, kernel_size=1, stride=1) | |||||
| # landmark localization | |||||
| self.conv4_3 = nn.Conv2d(32, 10, kernel_size=1, stride=1) | |||||
| # weight initiation with xavier | |||||
| self.apply(weights_init) | |||||
| def forward(self, x): | |||||
| x = self.pre_layer(x) | |||||
| label = F.sigmoid(self.conv4_1(x)) | |||||
| offset = self.conv4_2(x) | |||||
| # landmark = self.conv4_3(x) | |||||
| if self.is_train is True: | |||||
| # label_loss = LossUtil.label_loss(self.gt_label,torch.squeeze(label)) | |||||
| # bbox_loss = LossUtil.bbox_loss(self.gt_bbox,torch.squeeze(offset)) | |||||
| return label,offset | |||||
| #landmark = self.conv4_3(x) | |||||
| return label, offset | |||||
| class RNet(nn.Module): | |||||
| ''' RNet ''' | |||||
| def __init__(self,is_train=False, use_cuda=True): | |||||
| super(RNet, self).__init__() | |||||
| self.is_train = is_train | |||||
| self.use_cuda = use_cuda | |||||
| # backend | |||||
| self.pre_layer = nn.Sequential( | |||||
| nn.Conv2d(3, 28, kernel_size=3, stride=1), # conv1 | |||||
| nn.PReLU(), # prelu1 | |||||
| nn.MaxPool2d(kernel_size=3, stride=2), # pool1 | |||||
| nn.Conv2d(28, 48, kernel_size=3, stride=1), # conv2 | |||||
| nn.PReLU(), # prelu2 | |||||
| nn.MaxPool2d(kernel_size=3, stride=2), # pool2 | |||||
| nn.Conv2d(48, 64, kernel_size=2, stride=1), # conv3 | |||||
| nn.PReLU() # prelu3 | |||||
| ) | |||||
| self.conv4 = nn.Linear(64*2*2, 128) # conv4 | |||||
| self.prelu4 = nn.PReLU() # prelu4 | |||||
| # detection | |||||
| self.conv5_1 = nn.Linear(128, 1) | |||||
| # bounding box regression | |||||
| self.conv5_2 = nn.Linear(128, 4) | |||||
| # lanbmark localization | |||||
| self.conv5_3 = nn.Linear(128, 10) | |||||
| # weight initiation weih xavier | |||||
| self.apply(weights_init) | |||||
| def forward(self, x): | |||||
| # backend | |||||
| x = self.pre_layer(x) | |||||
| x = x.view(x.size(0), -1) | |||||
| x = self.conv4(x) | |||||
| x = self.prelu4(x) | |||||
| # detection | |||||
| det = torch.sigmoid(self.conv5_1(x)) | |||||
| box = self.conv5_2(x) | |||||
| # landmark = self.conv5_3(x) | |||||
| if self.is_train is True: | |||||
| return det, box | |||||
| #landmard = self.conv5_3(x) | |||||
| return det, box | |||||
| class ONet(nn.Module): | |||||
| ''' RNet ''' | |||||
| def __init__(self,is_train=False, use_cuda=True): | |||||
| super(ONet, self).__init__() | |||||
| self.is_train = is_train | |||||
| self.use_cuda = use_cuda | |||||
| # backend | |||||
| self.pre_layer = nn.Sequential( | |||||
| nn.Conv2d(3, 32, kernel_size=3, stride=1), # conv1 | |||||
| nn.PReLU(), # prelu1 | |||||
| nn.MaxPool2d(kernel_size=3, stride=2), # pool1 | |||||
| nn.Conv2d(32, 64, kernel_size=3, stride=1), # conv2 | |||||
| nn.PReLU(), # prelu2 | |||||
| nn.MaxPool2d(kernel_size=3, stride=2), # pool2 | |||||
| nn.Conv2d(64, 64, kernel_size=3, stride=1), # conv3 | |||||
| nn.PReLU(), # prelu3 | |||||
| nn.MaxPool2d(kernel_size=2,stride=2), # pool3 | |||||
| nn.Conv2d(64,128,kernel_size=2,stride=1), # conv4 | |||||
| nn.PReLU() # prelu4 | |||||
| ) | |||||
| self.conv5 = nn.Linear(128*2*2, 256) # conv5 | |||||
| self.prelu5 = nn.PReLU() # prelu5 | |||||
| # detection | |||||
| self.conv6_1 = nn.Linear(256, 1) | |||||
| # bounding box regression | |||||
| self.conv6_2 = nn.Linear(256, 4) | |||||
| # lanbmark localization | |||||
| self.conv6_3 = nn.Linear(256, 10) | |||||
| # weight initiation weih xavier | |||||
| self.apply(weights_init) | |||||
| def forward(self, x): | |||||
| # backend | |||||
| x = self.pre_layer(x) | |||||
| x = x.view(x.size(0), -1) | |||||
| x = self.conv5(x) | |||||
| x = self.prelu5(x) | |||||
| # detection | |||||
| det = torch.sigmoid(self.conv6_1(x)) | |||||
| box = self.conv6_2(x) | |||||
| landmark = self.conv6_3(x) | |||||
| if self.is_train is True: | |||||
| return det, box, landmark | |||||
| #landmard = self.conv5_3(x) | |||||
| return det, box, landmark | |||||
| @@ -0,0 +1,42 @@ | |||||
| import numpy as np | |||||
| def torch_nms(dets, thresh, mode="Union"): | |||||
| """ | |||||
| greedily select boxes with high confidence | |||||
| keep boxes overlap <= thresh | |||||
| rule out overlap > thresh | |||||
| :param dets: [[x1, y1, x2, y2 score]] | |||||
| :param thresh: retain overlap <= thresh | |||||
| :return: indexes to keep | |||||
| """ | |||||
| x1 = dets[:, 0] | |||||
| y1 = dets[:, 1] | |||||
| x2 = dets[:, 2] | |||||
| y2 = dets[:, 3] | |||||
| scores = dets[:, 4] | |||||
| areas = (x2 - x1 + 1) * (y2 - y1 + 1) | |||||
| order = scores.argsort()[::-1] | |||||
| keep = [] | |||||
| while order.size > 0: | |||||
| i = order[0] | |||||
| keep.append(i) | |||||
| xx1 = np.maximum(x1[i], x1[order[1:]]) | |||||
| yy1 = np.maximum(y1[i], y1[order[1:]]) | |||||
| xx2 = np.minimum(x2[i], x2[order[1:]]) | |||||
| yy2 = np.minimum(y2[i], y2[order[1:]]) | |||||
| w = np.maximum(0.0, xx2 - xx1 + 1) | |||||
| h = np.maximum(0.0, yy2 - yy1 + 1) | |||||
| inter = w * h | |||||
| if mode == "Union": | |||||
| ovr = inter / (areas[i] + areas[order[1:]] - inter) | |||||
| elif mode == "Minimum": | |||||
| ovr = inter / np.minimum(areas[i], areas[order[1:]]) | |||||
| inds = np.where(ovr <= thresh)[0] | |||||
| order = order[inds + 1] | |||||
| return keep | |||||
| @@ -0,0 +1,2 @@ | |||||
| import numpy as np | |||||
| @@ -0,0 +1,101 @@ | |||||
| import numpy as np | |||||
| def IoU(box, boxes): | |||||
| """Compute IoU between detect box and gt boxes | |||||
| Parameters: | |||||
| ---------- | |||||
| box: numpy array , shape (5, ): x1, y1, x2, y2, score | |||||
| input box | |||||
| boxes: numpy array, shape (n, 4): x1, y1, x2, y2 | |||||
| input ground truth boxes | |||||
| Returns: | |||||
| ------- | |||||
| ovr: numpy.array, shape (n, ) | |||||
| IoU | |||||
| """ | |||||
| box_area = (box[2] - box[0] + 1) * (box[3] - box[1] + 1) | |||||
| area = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1) | |||||
| xx1 = np.maximum(box[0], boxes[:, 0]) | |||||
| yy1 = np.maximum(box[1], boxes[:, 1]) | |||||
| xx2 = np.minimum(box[2], boxes[:, 2]) | |||||
| yy2 = np.minimum(box[3], boxes[:, 3]) | |||||
| # compute the width and height of the bounding box | |||||
| w = np.maximum(0, xx2 - xx1 + 1) | |||||
| h = np.maximum(0, yy2 - yy1 + 1) | |||||
| inter = w * h | |||||
| ovr = np.true_divide(inter,(box_area + area - inter)) | |||||
| #ovr = inter / (box_area + area - inter) | |||||
| return ovr | |||||
| def convert_to_square(bbox): | |||||
| """Convert bbox to square | |||||
| Parameters: | |||||
| ---------- | |||||
| bbox: numpy array , shape n x 5 | |||||
| input bbox | |||||
| Returns: | |||||
| ------- | |||||
| square bbox | |||||
| """ | |||||
| square_bbox = bbox.copy() | |||||
| h = bbox[:, 3] - bbox[:, 1] + 1 | |||||
| w = bbox[:, 2] - bbox[:, 0] + 1 | |||||
| max_side = np.maximum(h,w) | |||||
| square_bbox[:, 0] = bbox[:, 0] + w*0.5 - max_side*0.5 | |||||
| square_bbox[:, 1] = bbox[:, 1] + h*0.5 - max_side*0.5 | |||||
| square_bbox[:, 2] = square_bbox[:, 0] + max_side - 1 | |||||
| square_bbox[:, 3] = square_bbox[:, 1] + max_side - 1 | |||||
| return square_bbox | |||||
| def nms(dets, thresh, mode="Union"): | |||||
| """ | |||||
| greedily select boxes with high confidence | |||||
| keep boxes overlap <= thresh | |||||
| rule out overlap > thresh | |||||
| :param dets: [[x1, y1, x2, y2 score]] | |||||
| :param thresh: retain overlap <= thresh | |||||
| :return: indexes to keep | |||||
| """ | |||||
| x1 = dets[:, 0] | |||||
| y1 = dets[:, 1] | |||||
| x2 = dets[:, 2] | |||||
| y2 = dets[:, 3] | |||||
| scores = dets[:, 4] | |||||
| areas = (x2 - x1 + 1) * (y2 - y1 + 1) | |||||
| order = scores.argsort()[::-1] | |||||
| keep = [] | |||||
| while order.size > 0: | |||||
| i = order[0] | |||||
| keep.append(i) | |||||
| xx1 = np.maximum(x1[i], x1[order[1:]]) | |||||
| yy1 = np.maximum(y1[i], y1[order[1:]]) | |||||
| xx2 = np.minimum(x2[i], x2[order[1:]]) | |||||
| yy2 = np.minimum(y2[i], y2[order[1:]]) | |||||
| w = np.maximum(0.0, xx2 - xx1 + 1) | |||||
| h = np.maximum(0.0, yy2 - yy1 + 1) | |||||
| inter = w * h | |||||
| if mode == "Union": | |||||
| ovr = inter / (areas[i] + areas[order[1:]] - inter) | |||||
| elif mode == "Minimum": | |||||
| ovr = inter / np.minimum(areas[i], areas[order[1:]]) | |||||
| inds = np.where(ovr <= thresh)[0] | |||||
| order = order[inds + 1] | |||||
| return keep | |||||
| @@ -0,0 +1,141 @@ | |||||
| from matplotlib.patches import Circle | |||||
| def vis_two(im_array, dets1, dets2, thresh=0.9): | |||||
| """Visualize detection results before and after calibration | |||||
| Parameters: | |||||
| ---------- | |||||
| im_array: numpy.ndarray, shape(1, c, h, w) | |||||
| test image in rgb | |||||
| dets1: numpy.ndarray([[x1 y1 x2 y2 score]]) | |||||
| detection results before calibration | |||||
| dets2: numpy.ndarray([[x1 y1 x2 y2 score]]) | |||||
| detection results after calibration | |||||
| thresh: float | |||||
| boxes with scores > thresh will be drawn in red otherwise yellow | |||||
| Returns: | |||||
| ------- | |||||
| """ | |||||
| import matplotlib.pyplot as plt | |||||
| import random | |||||
| figure = plt.figure() | |||||
| plt.subplot(121) | |||||
| plt.imshow(im_array) | |||||
| color = 'yellow' | |||||
| for i in range(dets1.shape[0]): | |||||
| bbox = dets1[i, :4] | |||||
| landmarks = dets1[i, 5:] | |||||
| score = dets1[i, 4] | |||||
| if score > thresh: | |||||
| rect = plt.Rectangle((bbox[0], bbox[1]), | |||||
| bbox[2] - bbox[0], | |||||
| bbox[3] - bbox[1], fill=False, | |||||
| edgecolor='red', linewidth=0.7) | |||||
| plt.gca().add_patch(rect) | |||||
| landmarks = landmarks.reshape((5,2)) | |||||
| for j in range(5): | |||||
| plt.scatter(landmarks[j,0],landmarks[j,1],c='yellow',linewidths=0.1, marker='x', s=5) | |||||
| # plt.gca().text(bbox[0], bbox[1] - 2, | |||||
| # '{:.3f}'.format(score), | |||||
| # bbox=dict(facecolor='blue', alpha=0.5), fontsize=12, color='white') | |||||
| # else: | |||||
| # rect = plt.Rectangle((bbox[0], bbox[1]), | |||||
| # bbox[2] - bbox[0], | |||||
| # bbox[3] - bbox[1], fill=False, | |||||
| # edgecolor=color, linewidth=0.5) | |||||
| # plt.gca().add_patch(rect) | |||||
| plt.subplot(122) | |||||
| plt.imshow(im_array) | |||||
| color = 'yellow' | |||||
| for i in range(dets2.shape[0]): | |||||
| bbox = dets2[i, :4] | |||||
| landmarks = dets1[i, 5:] | |||||
| score = dets2[i, 4] | |||||
| if score > thresh: | |||||
| rect = plt.Rectangle((bbox[0], bbox[1]), | |||||
| bbox[2] - bbox[0], | |||||
| bbox[3] - bbox[1], fill=False, | |||||
| edgecolor='red', linewidth=0.7) | |||||
| plt.gca().add_patch(rect) | |||||
| landmarks = landmarks.reshape((5, 2)) | |||||
| for j in range(5): | |||||
| plt.scatter(landmarks[j, 0], landmarks[j, 1], c='yellow',linewidths=0.1, marker='x', s=5) | |||||
| # plt.gca().text(bbox[0], bbox[1] - 2, | |||||
| # '{:.3f}'.format(score), | |||||
| # bbox=dict(facecolor='blue', alpha=0.5), fontsize=12, color='white') | |||||
| # else: | |||||
| # rect = plt.Rectangle((bbox[0], bbox[1]), | |||||
| # bbox[2] - bbox[0], | |||||
| # bbox[3] - bbox[1], fill=False, | |||||
| # edgecolor=color, linewidth=0.5) | |||||
| # plt.gca().add_patch(rect) | |||||
| plt.show() | |||||
| def vis_face(im_array, dets, landmarks=None): | |||||
| """Visualize detection results before and after calibration | |||||
| Parameters: | |||||
| ---------- | |||||
| im_array: numpy.ndarray, shape(1, c, h, w) | |||||
| test image in rgb | |||||
| dets1: numpy.ndarray([[x1 y1 x2 y2 score]]) | |||||
| detection results before calibration | |||||
| dets2: numpy.ndarray([[x1 y1 x2 y2 score]]) | |||||
| detection results after calibration | |||||
| thresh: float | |||||
| boxes with scores > thresh will be drawn in red otherwise yellow | |||||
| Returns: | |||||
| ------- | |||||
| """ | |||||
| import matplotlib.pyplot as plt | |||||
| import random | |||||
| import pylab | |||||
| figure = pylab.figure() | |||||
| # plt.subplot(121) | |||||
| pylab.imshow(im_array) | |||||
| figure.suptitle('DFace Detector', fontsize=20) | |||||
| for i in range(dets.shape[0]): | |||||
| bbox = dets[i, :4] | |||||
| rect = pylab.Rectangle((bbox[0], bbox[1]), | |||||
| bbox[2] - bbox[0], | |||||
| bbox[3] - bbox[1], fill=False, | |||||
| edgecolor='yellow', linewidth=0.9) | |||||
| pylab.gca().add_patch(rect) | |||||
| if landmarks is not None: | |||||
| for i in range(landmarks.shape[0]): | |||||
| landmarks_one = landmarks[i, :] | |||||
| landmarks_one = landmarks_one.reshape((5, 2)) | |||||
| for j in range(5): | |||||
| # pylab.scatter(landmarks_one[j, 0], landmarks_one[j, 1], c='yellow', linewidths=0.1, marker='x', s=5) | |||||
| cir1 = Circle(xy=(landmarks_one[j, 0], landmarks_one[j, 1]), radius=2, alpha=0.4, color="red") | |||||
| pylab.gca().add_patch(cir1) | |||||
| # plt.gca().text(bbox[0], bbox[1] - 2, | |||||
| # '{:.3f}'.format(score), | |||||
| # bbox=dict(facecolor='blue', alpha=0.5), fontsize=12, color='white') | |||||
| # else: | |||||
| # rect = plt.Rectangle((bbox[0], bbox[1]), | |||||
| # bbox[2] - bbox[0], | |||||
| # bbox[3] - bbox[1], fill=False, | |||||
| # edgecolor=color, linewidth=0.5) | |||||
| # plt.gca().add_patch(rect) | |||||
| pylab.show() | |||||
| @@ -0,0 +1,35 @@ | |||||
| import os | |||||
| import numpy.random as npr | |||||
| import numpy as np | |||||
| def assemble_data(output_file, anno_file_list=[]): | |||||
| #assemble the annotations to one file | |||||
| size = 12 | |||||
| if len(anno_file_list)==0: | |||||
| return 0 | |||||
| if os.path.exists(output_file): | |||||
| os.remove(output_file) | |||||
| for anno_file in anno_file_list: | |||||
| with open(anno_file, 'r') as f: | |||||
| anno_lines = f.readlines() | |||||
| base_num = 250000 | |||||
| if len(anno_lines) > base_num * 3: | |||||
| idx_keep = npr.choice(len(anno_lines), size=base_num * 3, replace=True) | |||||
| elif len(anno_lines) > 100000: | |||||
| idx_keep = npr.choice(len(anno_lines), size=len(anno_lines), replace=True) | |||||
| else: | |||||
| idx_keep = np.arange(len(anno_lines)) | |||||
| np.random.shuffle(idx_keep) | |||||
| chose_count = 0 | |||||
| with open(output_file, 'a+') as f: | |||||
| for idx in idx_keep: | |||||
| f.write(anno_lines[idx]) | |||||
| chose_count+=1 | |||||
| return chose_count | |||||
| @@ -0,0 +1,25 @@ | |||||
| import os | |||||
| import config | |||||
| import assemble as assemble | |||||
| if __name__ == '__main__': | |||||
| anno_list = [] | |||||
| net_landmark_file = os.path.join(config.ANNO_STORE_DIR,config.ONET_LANDMARK_ANNO_FILENAME) | |||||
| net_postive_file = os.path.join(config.ANNO_STORE_DIR,config.ONET_POSTIVE_ANNO_FILENAME) | |||||
| net_part_file = os.path.join(config.ANNO_STORE_DIR,config.ONET_PART_ANNO_FILENAME) | |||||
| net_neg_file = os.path.join(config.ANNO_STORE_DIR,config.ONET_NEGATIVE_ANNO_FILENAME) | |||||
| anno_list.append(net_postive_file) | |||||
| anno_list.append(net_part_file) | |||||
| anno_list.append(net_neg_file) | |||||
| anno_list.append(net_landmark_file) | |||||
| imglist_filename = config.ONET_TRAIN_IMGLIST_FILENAME | |||||
| anno_dir = config.ANNO_STORE_DIR | |||||
| imglist_file = os.path.join(anno_dir, imglist_filename) | |||||
| chose_count = assemble.assemble_data(imglist_file ,anno_list) | |||||
| print "PNet train annotation result file path:%s" % imglist_file | |||||
| @@ -0,0 +1,25 @@ | |||||
| import os | |||||
| import config | |||||
| import assemble as assemble | |||||
| if __name__ == '__main__': | |||||
| anno_list = [] | |||||
| # pnet_landmark_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_LANDMARK_ANNO_FILENAME) | |||||
| pnet_postive_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_POSTIVE_ANNO_FILENAME) | |||||
| pnet_part_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_PART_ANNO_FILENAME) | |||||
| pnet_neg_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_NEGATIVE_ANNO_FILENAME) | |||||
| anno_list.append(pnet_postive_file) | |||||
| anno_list.append(pnet_part_file) | |||||
| anno_list.append(pnet_neg_file) | |||||
| # anno_list.append(pnet_landmark_file) | |||||
| imglist_filename = config.PNET_TRAIN_IMGLIST_FILENAME | |||||
| anno_dir = config.ANNO_STORE_DIR | |||||
| imglist_file = os.path.join(anno_dir, imglist_filename) | |||||
| chose_count = assemble.assemble_data(imglist_file ,anno_list) | |||||
| print "PNet train annotation result file path:%s" % imglist_file | |||||
| @@ -0,0 +1,25 @@ | |||||
| import os | |||||
| import config | |||||
| import assemble as assemble | |||||
| if __name__ == '__main__': | |||||
| anno_list = [] | |||||
| # pnet_landmark_file = os.path.join(config.ANNO_STORE_DIR,config.RNET_LANDMARK_ANNO_FILENAME) | |||||
| pnet_postive_file = os.path.join(config.ANNO_STORE_DIR,config.RNET_POSTIVE_ANNO_FILENAME) | |||||
| pnet_part_file = os.path.join(config.ANNO_STORE_DIR,config.RNET_PART_ANNO_FILENAME) | |||||
| pnet_neg_file = os.path.join(config.ANNO_STORE_DIR,config.RNET_NEGATIVE_ANNO_FILENAME) | |||||
| anno_list.append(pnet_postive_file) | |||||
| anno_list.append(pnet_part_file) | |||||
| anno_list.append(pnet_neg_file) | |||||
| # anno_list.append(pnet_landmark_file) | |||||
| imglist_filename = config.RNET_TRAIN_IMGLIST_FILENAME | |||||
| anno_dir = config.ANNO_STORE_DIR | |||||
| imglist_file = os.path.join(anno_dir, imglist_filename) | |||||
| chose_count = assemble.assemble_data(imglist_file ,anno_list) | |||||
| print "PNet train annotation result file path:%s" % imglist_file | |||||
| @@ -0,0 +1,220 @@ | |||||
| import argparse | |||||
| import cv2 | |||||
| import numpy as np | |||||
| from core.detect import MtcnnDetector,create_mtcnn_net | |||||
| from core.imagedb import ImageDB | |||||
| from core.image_reader import TestImageLoader | |||||
| import time | |||||
| import os | |||||
| import cPickle | |||||
| from core.utils import convert_to_square,IoU | |||||
| import config | |||||
| import core.vision as vision | |||||
| def gen_onet_data(data_dir, anno_file, pnet_model_file, rnet_model_file, prefix_path='', use_cuda=True, vis=False): | |||||
| pnet, rnet, _ = create_mtcnn_net(p_model_path=pnet_model_file, r_model_path=rnet_model_file, use_cuda=use_cuda) | |||||
| mtcnn_detector = MtcnnDetector(pnet=pnet, rnet=rnet, min_face_size=12) | |||||
| imagedb = ImageDB(anno_file,mode="test",prefix_path=prefix_path) | |||||
| imdb = imagedb.load_imdb() | |||||
| image_reader = TestImageLoader(imdb,1,False) | |||||
| all_boxes = list() | |||||
| batch_idx = 0 | |||||
| for databatch in image_reader: | |||||
| if batch_idx % 100 == 0: | |||||
| print "%d images done" % batch_idx | |||||
| im = databatch | |||||
| t = time.time() | |||||
| p_boxes, p_boxes_align = mtcnn_detector.detect_pnet(im=im) | |||||
| boxes, boxes_align = mtcnn_detector.detect_rnet(im=im, dets=p_boxes_align) | |||||
| if boxes_align is None: | |||||
| all_boxes.append(np.array([])) | |||||
| batch_idx += 1 | |||||
| continue | |||||
| if vis: | |||||
| rgb_im = cv2.cvtColor(np.asarray(im), cv2.COLOR_BGR2RGB) | |||||
| vision.vis_two(rgb_im, boxes, boxes_align) | |||||
| t1 = time.time() - t | |||||
| t = time.time() | |||||
| all_boxes.append(boxes_align) | |||||
| batch_idx += 1 | |||||
| save_path = config.MODEL_STORE_DIR | |||||
| if not os.path.exists(save_path): | |||||
| os.mkdir(save_path) | |||||
| save_file = os.path.join(save_path, "detections_%d.pkl" % int(time.time())) | |||||
| with open(save_file, 'wb') as f: | |||||
| cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL) | |||||
| gen_onet_sample_data(data_dir,anno_file,save_file) | |||||
| def gen_onet_sample_data(data_dir,anno_file,det_boxs_file): | |||||
| neg_save_dir = os.path.join(data_dir, "48/negative") | |||||
| pos_save_dir = os.path.join(data_dir, "48/positive") | |||||
| part_save_dir = os.path.join(data_dir, "48/part") | |||||
| for dir_path in [neg_save_dir, pos_save_dir, part_save_dir]: | |||||
| if not os.path.exists(dir_path): | |||||
| os.makedirs(dir_path) | |||||
| # load ground truth from annotation file | |||||
| # format of each line: image/path [x1,y1,x2,y2] for each gt_box in this image | |||||
| with open(anno_file, 'r') as f: | |||||
| annotations = f.readlines() | |||||
| image_size = 48 | |||||
| net = "onet" | |||||
| im_idx_list = list() | |||||
| gt_boxes_list = list() | |||||
| num_of_images = len(annotations) | |||||
| print "processing %d images in total" % num_of_images | |||||
| for annotation in annotations: | |||||
| annotation = annotation.strip().split(' ') | |||||
| im_idx = annotation[0] | |||||
| boxes = map(float, annotation[1:]) | |||||
| boxes = np.array(boxes, dtype=np.float32).reshape(-1, 4) | |||||
| im_idx_list.append(im_idx) | |||||
| gt_boxes_list.append(boxes) | |||||
| save_path = config.ANNO_STORE_DIR | |||||
| if not os.path.exists(save_path): | |||||
| os.makedirs(save_path) | |||||
| f1 = open(os.path.join(save_path, 'pos_%d.txt' % image_size), 'w') | |||||
| f2 = open(os.path.join(save_path, 'neg_%d.txt' % image_size), 'w') | |||||
| f3 = open(os.path.join(save_path, 'part_%d.txt' % image_size), 'w') | |||||
| det_handle = open(det_boxs_file, 'r') | |||||
| det_boxes = cPickle.load(det_handle) | |||||
| print len(det_boxes), num_of_images | |||||
| assert len(det_boxes) == num_of_images, "incorrect detections or ground truths" | |||||
| # index of neg, pos and part face, used as their image names | |||||
| n_idx = 0 | |||||
| p_idx = 0 | |||||
| d_idx = 0 | |||||
| image_done = 0 | |||||
| for im_idx, dets, gts in zip(im_idx_list, det_boxes, gt_boxes_list): | |||||
| if image_done % 100 == 0: | |||||
| print "%d images done" % image_done | |||||
| image_done += 1 | |||||
| if dets.shape[0] == 0: | |||||
| continue | |||||
| img = cv2.imread(im_idx) | |||||
| dets = convert_to_square(dets) | |||||
| dets[:, 0:4] = np.round(dets[:, 0:4]) | |||||
| for box in dets: | |||||
| x_left, y_top, x_right, y_bottom = box[0:4].astype(int) | |||||
| width = x_right - x_left + 1 | |||||
| height = y_bottom - y_top + 1 | |||||
| # ignore box that is too small or beyond image border | |||||
| if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1: | |||||
| continue | |||||
| # compute intersection over union(IoU) between current box and all gt boxes | |||||
| Iou = IoU(box, gts) | |||||
| cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :] | |||||
| resized_im = cv2.resize(cropped_im, (image_size, image_size), | |||||
| interpolation=cv2.INTER_LINEAR) | |||||
| # save negative images and write label | |||||
| if np.max(Iou) < 0.3: | |||||
| # Iou with all gts must below 0.3 | |||||
| save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx) | |||||
| f2.write(save_file + ' 0\n') | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| n_idx += 1 | |||||
| else: | |||||
| # find gt_box with the highest iou | |||||
| idx = np.argmax(Iou) | |||||
| assigned_gt = gts[idx] | |||||
| x1, y1, x2, y2 = assigned_gt | |||||
| # compute bbox reg label | |||||
| offset_x1 = (x1 - x_left) / float(width) | |||||
| offset_y1 = (y1 - y_top) / float(height) | |||||
| offset_x2 = (x2 - x_right) / float(width) | |||||
| offset_y2 = (y2 - y_bottom) / float(height) | |||||
| # save positive and part-face images and write labels | |||||
| if np.max(Iou) >= 0.65: | |||||
| save_file = os.path.join(pos_save_dir, "%s.jpg" % p_idx) | |||||
| f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % ( | |||||
| offset_x1, offset_y1, offset_x2, offset_y2)) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| p_idx += 1 | |||||
| elif np.max(Iou) >= 0.4: | |||||
| save_file = os.path.join(part_save_dir, "%s.jpg" % d_idx) | |||||
| f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % ( | |||||
| offset_x1, offset_y1, offset_x2, offset_y2)) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| d_idx += 1 | |||||
| f1.close() | |||||
| f2.close() | |||||
| f3.close() | |||||
| def model_store_path(): | |||||
| return os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))+"/model_store" | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Test mtcnn', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder', | |||||
| default='../data/wider/', type=str) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', help='output data folder', | |||||
| default='../data/wider/anno.txt', type=str) | |||||
| parser.add_argument('--pmodel_file', dest='pnet_model_file', help='PNet model file path', | |||||
| default='/idata/workspace/mtcnn/model_store/pnet_epoch_5best.pt', type=str) | |||||
| parser.add_argument('--rmodel_file', dest='rnet_model_file', help='RNet model file path', | |||||
| default='/idata/workspace/mtcnn/model_store/rnet_epoch_1.pt', type=str) | |||||
| parser.add_argument('--gpu', dest='use_cuda', help='with gpu', | |||||
| default=config.USE_CUDA, type=bool) | |||||
| parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path', | |||||
| default='', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| gen_onet_data(args.dataset_path, args.annotation_file, args.pnet_model_file, args.rnet_model_file, args.prefix_path, args.use_cuda) | |||||
| @@ -0,0 +1,174 @@ | |||||
| import argparse | |||||
| import numpy as np | |||||
| import cv2 | |||||
| import os | |||||
| import numpy.random as npr | |||||
| from core.utils import IoU | |||||
| import config | |||||
| def gen_pnet_data(data_dir,anno_file): | |||||
| neg_save_dir = os.path.join(data_dir,"12/negative") | |||||
| pos_save_dir = os.path.join(data_dir,"12/positive") | |||||
| part_save_dir = os.path.join(data_dir,"12/part") | |||||
| for dir_path in [neg_save_dir,pos_save_dir,part_save_dir]: | |||||
| if not os.path.exists(dir_path): | |||||
| os.makedirs(dir_path) | |||||
| save_dir = os.path.join(data_dir,"pnet") | |||||
| if not os.path.exists(save_dir): | |||||
| os.mkdir(save_dir) | |||||
| post_save_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_POSTIVE_ANNO_FILENAME) | |||||
| neg_save_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_NEGATIVE_ANNO_FILENAME) | |||||
| part_save_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_PART_ANNO_FILENAME) | |||||
| f1 = open(post_save_file, 'w') | |||||
| f2 = open(neg_save_file, 'w') | |||||
| f3 = open(part_save_file, 'w') | |||||
| with open(anno_file, 'r') as f: | |||||
| annotations = f.readlines() | |||||
| num = len(annotations) | |||||
| print "%d pics in total" % num | |||||
| p_idx = 0 | |||||
| n_idx = 0 | |||||
| d_idx = 0 | |||||
| idx = 0 | |||||
| box_idx = 0 | |||||
| for annotation in annotations: | |||||
| annotation = annotation.strip().split(' ') | |||||
| im_path = annotation[0] | |||||
| bbox = map(float, annotation[1:]) | |||||
| boxes = np.array(bbox, dtype=np.int32).reshape(-1, 4) | |||||
| img = cv2.imread(im_path) | |||||
| idx += 1 | |||||
| if idx % 100 == 0: | |||||
| print idx, "images done" | |||||
| height, width, channel = img.shape | |||||
| neg_num = 0 | |||||
| while neg_num < 50: | |||||
| size = npr.randint(12, min(width, height) / 2) | |||||
| nx = npr.randint(0, width - size) | |||||
| ny = npr.randint(0, height - size) | |||||
| crop_box = np.array([nx, ny, nx + size, ny + size]) | |||||
| Iou = IoU(crop_box, boxes) | |||||
| cropped_im = img[ny : ny + size, nx : nx + size, :] | |||||
| resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR) | |||||
| if np.max(Iou) < 0.3: | |||||
| # Iou with all gts must below 0.3 | |||||
| save_file = os.path.join(neg_save_dir, "%s.jpg"%n_idx) | |||||
| f2.write(save_file + ' 0\n') | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| n_idx += 1 | |||||
| neg_num += 1 | |||||
| for box in boxes: | |||||
| # box (x_left, y_top, x_right, y_bottom) | |||||
| x1, y1, x2, y2 = box | |||||
| w = x2 - x1 + 1 | |||||
| h = y2 - y1 + 1 | |||||
| # ignore small faces | |||||
| # in case the ground truth boxes of small faces are not accurate | |||||
| if max(w, h) < 40 or x1 < 0 or y1 < 0: | |||||
| continue | |||||
| # generate negative examples that have overlap with gt | |||||
| for i in range(5): | |||||
| size = npr.randint(12, min(width, height) / 2) | |||||
| # delta_x and delta_y are offsets of (x1, y1) | |||||
| delta_x = npr.randint(max(-size, -x1), w) | |||||
| delta_y = npr.randint(max(-size, -y1), h) | |||||
| nx1 = max(0, x1 + delta_x) | |||||
| ny1 = max(0, y1 + delta_y) | |||||
| if nx1 + size > width or ny1 + size > height: | |||||
| continue | |||||
| crop_box = np.array([nx1, ny1, nx1 + size, ny1 + size]) | |||||
| Iou = IoU(crop_box, boxes) | |||||
| cropped_im = img[ny1 : ny1 + size, nx1 : nx1 + size, :] | |||||
| resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR) | |||||
| if np.max(Iou) < 0.3: | |||||
| # Iou with all gts must below 0.3 | |||||
| save_file = os.path.join(neg_save_dir, "%s.jpg"%n_idx) | |||||
| f2.write(save_file + ' 0\n') | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| n_idx += 1 | |||||
| # generate positive examples and part faces | |||||
| for i in range(20): | |||||
| size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h))) | |||||
| # delta here is the offset of box center | |||||
| delta_x = npr.randint(-w * 0.2, w * 0.2) | |||||
| delta_y = npr.randint(-h * 0.2, h * 0.2) | |||||
| nx1 = max(x1 + w / 2 + delta_x - size / 2, 0) | |||||
| ny1 = max(y1 + h / 2 + delta_y - size / 2, 0) | |||||
| nx2 = nx1 + size | |||||
| ny2 = ny1 + size | |||||
| if nx2 > width or ny2 > height: | |||||
| continue | |||||
| crop_box = np.array([nx1, ny1, nx2, ny2]) | |||||
| offset_x1 = (x1 - nx1) / float(size) | |||||
| offset_y1 = (y1 - ny1) / float(size) | |||||
| offset_x2 = (x2 - nx2) / float(size) | |||||
| offset_y2 = (y2 - ny2) / float(size) | |||||
| cropped_im = img[ny1 : ny2, nx1 : nx2, :] | |||||
| resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR) | |||||
| box_ = box.reshape(1, -1) | |||||
| if IoU(crop_box, box_) >= 0.65: | |||||
| save_file = os.path.join(pos_save_dir, "%s.jpg"%p_idx) | |||||
| f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n'%(offset_x1, offset_y1, offset_x2, offset_y2)) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| p_idx += 1 | |||||
| elif IoU(crop_box, box_) >= 0.4: | |||||
| save_file = os.path.join(part_save_dir, "%s.jpg"%d_idx) | |||||
| f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n'%(offset_x1, offset_y1, offset_x2, offset_y2)) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| d_idx += 1 | |||||
| box_idx += 1 | |||||
| print "%s images done, pos: %s part: %s neg: %s"%(idx, p_idx, d_idx, n_idx) | |||||
| f1.close() | |||||
| f2.close() | |||||
| f3.close() | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Test mtcnn', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder', | |||||
| default='../data/wider/', type=str) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file', | |||||
| default='../data/wider/anno.txt', type=str) | |||||
| parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path', | |||||
| default='', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| gen_pnet_data(args.dataset_path,args.annotation_file) | |||||
| @@ -0,0 +1,219 @@ | |||||
| import argparse | |||||
| import cv2 | |||||
| import numpy as np | |||||
| from core.detect import MtcnnDetector,create_mtcnn_net | |||||
| from core.imagedb import ImageDB | |||||
| from core.image_reader import TestImageLoader | |||||
| import time | |||||
| import os | |||||
| import cPickle | |||||
| from core.utils import convert_to_square,IoU | |||||
| import config | |||||
| import core.vision as vision | |||||
| def gen_rnet_data(data_dir, anno_file, pnet_model_file, prefix_path='', use_cuda=True, vis=False): | |||||
| pnet, _, _ = create_mtcnn_net(p_model_path=pnet_model_file, use_cuda=use_cuda) | |||||
| mtcnn_detector = MtcnnDetector(pnet=pnet,min_face_size=12) | |||||
| imagedb = ImageDB(anno_file,mode="test",prefix_path=prefix_path) | |||||
| imdb = imagedb.load_imdb() | |||||
| image_reader = TestImageLoader(imdb,1,False) | |||||
| all_boxes = list() | |||||
| batch_idx = 0 | |||||
| for databatch in image_reader: | |||||
| if batch_idx % 100 == 0: | |||||
| print "%d images done" % batch_idx | |||||
| im = databatch | |||||
| t = time.time() | |||||
| boxes, boxes_align = mtcnn_detector.detect_pnet(im=im) | |||||
| if boxes_align is None: | |||||
| all_boxes.append(np.array([])) | |||||
| batch_idx += 1 | |||||
| continue | |||||
| if vis: | |||||
| rgb_im = cv2.cvtColor(np.asarray(im), cv2.COLOR_BGR2RGB) | |||||
| vision.vis_two(rgb_im, boxes, boxes_align) | |||||
| t1 = time.time() - t | |||||
| t = time.time() | |||||
| all_boxes.append(boxes_align) | |||||
| batch_idx += 1 | |||||
| # save_path = model_store_path() | |||||
| save_path = config.MODEL_STORE_DIR | |||||
| if not os.path.exists(save_path): | |||||
| os.mkdir(save_path) | |||||
| save_file = os.path.join(save_path, "detections_%d.pkl" % int(time.time())) | |||||
| with open(save_file, 'wb') as f: | |||||
| cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL) | |||||
| gen_rnet_sample_data(data_dir,anno_file,save_file) | |||||
| def gen_rnet_sample_data(data_dir,anno_file,det_boxs_file): | |||||
| neg_save_dir = os.path.join(data_dir, "24/negative") | |||||
| pos_save_dir = os.path.join(data_dir, "24/positive") | |||||
| part_save_dir = os.path.join(data_dir, "24/part") | |||||
| for dir_path in [neg_save_dir, pos_save_dir, part_save_dir]: | |||||
| if not os.path.exists(dir_path): | |||||
| os.makedirs(dir_path) | |||||
| # load ground truth from annotation file | |||||
| # format of each line: image/path [x1,y1,x2,y2] for each gt_box in this image | |||||
| with open(anno_file, 'r') as f: | |||||
| annotations = f.readlines() | |||||
| image_size = 24 | |||||
| net = "rnet" | |||||
| im_idx_list = list() | |||||
| gt_boxes_list = list() | |||||
| num_of_images = len(annotations) | |||||
| print "processing %d images in total" % num_of_images | |||||
| for annotation in annotations: | |||||
| annotation = annotation.strip().split(' ') | |||||
| im_idx = annotation[0] | |||||
| boxes = map(float, annotation[1:]) | |||||
| boxes = np.array(boxes, dtype=np.float32).reshape(-1, 4) | |||||
| im_idx_list.append(im_idx) | |||||
| gt_boxes_list.append(boxes) | |||||
| save_path = config.ANNO_STORE_DIR | |||||
| if not os.path.exists(save_path): | |||||
| os.makedirs(save_path) | |||||
| f1 = open(os.path.join(save_path, 'pos_%d.txt' % image_size), 'w') | |||||
| f2 = open(os.path.join(save_path, 'neg_%d.txt' % image_size), 'w') | |||||
| f3 = open(os.path.join(save_path, 'part_%d.txt' % image_size), 'w') | |||||
| det_handle = open(det_boxs_file, 'r') | |||||
| det_boxes = cPickle.load(det_handle) | |||||
| print len(det_boxes), num_of_images | |||||
| assert len(det_boxes) == num_of_images, "incorrect detections or ground truths" | |||||
| # index of neg, pos and part face, used as their image names | |||||
| n_idx = 0 | |||||
| p_idx = 0 | |||||
| d_idx = 0 | |||||
| image_done = 0 | |||||
| for im_idx, dets, gts in zip(im_idx_list, det_boxes, gt_boxes_list): | |||||
| if image_done % 100 == 0: | |||||
| print "%d images done" % image_done | |||||
| image_done += 1 | |||||
| if dets.shape[0] == 0: | |||||
| continue | |||||
| img = cv2.imread(im_idx) | |||||
| dets = convert_to_square(dets) | |||||
| dets[:, 0:4] = np.round(dets[:, 0:4]) | |||||
| for box in dets: | |||||
| x_left, y_top, x_right, y_bottom = box[0:4].astype(int) | |||||
| width = x_right - x_left + 1 | |||||
| height = y_bottom - y_top + 1 | |||||
| # ignore box that is too small or beyond image border | |||||
| if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1: | |||||
| continue | |||||
| # compute intersection over union(IoU) between current box and all gt boxes | |||||
| Iou = IoU(box, gts) | |||||
| cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :] | |||||
| resized_im = cv2.resize(cropped_im, (image_size, image_size), | |||||
| interpolation=cv2.INTER_LINEAR) | |||||
| # save negative images and write label | |||||
| if np.max(Iou) < 0.3: | |||||
| # Iou with all gts must below 0.3 | |||||
| save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx) | |||||
| f2.write(save_file + ' 0\n') | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| n_idx += 1 | |||||
| else: | |||||
| # find gt_box with the highest iou | |||||
| idx = np.argmax(Iou) | |||||
| assigned_gt = gts[idx] | |||||
| x1, y1, x2, y2 = assigned_gt | |||||
| # compute bbox reg label | |||||
| offset_x1 = (x1 - x_left) / float(width) | |||||
| offset_y1 = (y1 - y_top) / float(height) | |||||
| offset_x2 = (x2 - x_right) / float(width) | |||||
| offset_y2 = (y2 - y_bottom) / float(height) | |||||
| # save positive and part-face images and write labels | |||||
| if np.max(Iou) >= 0.65: | |||||
| save_file = os.path.join(pos_save_dir, "%s.jpg" % p_idx) | |||||
| f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % ( | |||||
| offset_x1, offset_y1, offset_x2, offset_y2)) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| p_idx += 1 | |||||
| elif np.max(Iou) >= 0.4: | |||||
| save_file = os.path.join(part_save_dir, "%s.jpg" % d_idx) | |||||
| f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % ( | |||||
| offset_x1, offset_y1, offset_x2, offset_y2)) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| d_idx += 1 | |||||
| f1.close() | |||||
| f2.close() | |||||
| f3.close() | |||||
| def model_store_path(): | |||||
| return os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))+"/model_store" | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Test mtcnn', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder', | |||||
| default='../data/wider/', type=str) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file', | |||||
| default='../data/wider/anno.txt', type=str) | |||||
| parser.add_argument('--pmodel_file', dest='pnet_model_file', help='PNet model file path', | |||||
| default='/idata/workspace/mtcnn/model_store/pnet_epoch_5best.pt', type=str) | |||||
| parser.add_argument('--gpu', dest='use_cuda', help='with gpu', | |||||
| default=config.USE_CUDA, type=bool) | |||||
| parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path', | |||||
| default='', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| gen_rnet_data(args.dataset_path, args.annotation_file, args.pnet_model_file, args.prefix_path, args.use_cuda) | |||||
| @@ -0,0 +1,156 @@ | |||||
| # coding: utf-8 | |||||
| import os | |||||
| import cv2 | |||||
| import numpy as np | |||||
| import sys | |||||
| import numpy.random as npr | |||||
| import argparse | |||||
| import config | |||||
| import core.utils as utils | |||||
| def gen_data(anno_file, data_dir, prefix): | |||||
| size = 12 | |||||
| image_id = 0 | |||||
| landmark_imgs_save_dir = os.path.join(data_dir,"12/landmark") | |||||
| if not os.path.exists(landmark_imgs_save_dir): | |||||
| os.makedirs(landmark_imgs_save_dir) | |||||
| anno_dir = config.ANNO_STORE_DIR | |||||
| if not os.path.exists(anno_dir): | |||||
| os.makedirs(anno_dir) | |||||
| landmark_anno_filename = config.PNET_LANDMARK_ANNO_FILENAME | |||||
| save_landmark_anno = os.path.join(anno_dir,landmark_anno_filename) | |||||
| f = open(save_landmark_anno, 'w') | |||||
| # dstdir = "train_landmark_few" | |||||
| with open(anno_file, 'r') as f2: | |||||
| annotations = f2.readlines() | |||||
| num = len(annotations) | |||||
| print "%d pics in total" % num | |||||
| l_idx =0 | |||||
| idx = 0 | |||||
| # image_path bbox landmark(5*2) | |||||
| for annotation in annotations: | |||||
| # print imgPath | |||||
| annotation = annotation.strip().split(' ') | |||||
| assert len(annotation)==15,"each line should have 15 element" | |||||
| im_path = os.path.join(prefix,annotation[0].replace("\\", "/")) | |||||
| gt_box = map(float, annotation[1:5]) | |||||
| gt_box = [gt_box[0], gt_box[2], gt_box[1], gt_box[3]] | |||||
| gt_box = np.array(gt_box, dtype=np.int32) | |||||
| landmark = bbox = map(float, annotation[5:]) | |||||
| landmark = np.array(landmark, dtype=np.float) | |||||
| img = cv2.imread(im_path) | |||||
| assert (img is not None) | |||||
| height, width, channel = img.shape | |||||
| # crop_face = img[gt_box[1]:gt_box[3]+1, gt_box[0]:gt_box[2]+1] | |||||
| # crop_face = cv2.resize(crop_face,(size,size)) | |||||
| idx = idx + 1 | |||||
| if idx % 100 == 0: | |||||
| print "%d images done, landmark images: %d"%(idx,l_idx) | |||||
| x1, y1, x2, y2 = gt_box | |||||
| # gt's width | |||||
| w = x2 - x1 + 1 | |||||
| # gt's height | |||||
| h = y2 - y1 + 1 | |||||
| if max(w, h) < 40 or x1 < 0 or y1 < 0: | |||||
| continue | |||||
| # random shift | |||||
| for i in range(10): | |||||
| bbox_size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h))) | |||||
| delta_x = npr.randint(-w * 0.2, w * 0.2) | |||||
| delta_y = npr.randint(-h * 0.2, h * 0.2) | |||||
| nx1 = max(x1 + w / 2 - bbox_size / 2 + delta_x, 0) | |||||
| ny1 = max(y1 + h / 2 - bbox_size / 2 + delta_y, 0) | |||||
| nx2 = nx1 + bbox_size | |||||
| ny2 = ny1 + bbox_size | |||||
| if nx2 > width or ny2 > height: | |||||
| continue | |||||
| crop_box = np.array([nx1, ny1, nx2, ny2]) | |||||
| cropped_im = img[ny1:ny2 + 1, nx1:nx2 + 1, :] | |||||
| resized_im = cv2.resize(cropped_im, (size, size),interpolation=cv2.INTER_LINEAR) | |||||
| offset_x1 = (x1 - nx1) / float(bbox_size) | |||||
| offset_y1 = (y1 - ny1) / float(bbox_size) | |||||
| offset_x2 = (x2 - nx2) / float(bbox_size) | |||||
| offset_y2 = (y2 - ny2) / float(bbox_size) | |||||
| offset_left_eye_x = (landmark[0] - nx1) / float(bbox_size) | |||||
| offset_left_eye_y = (landmark[1] - ny1) / float(bbox_size) | |||||
| offset_right_eye_x = (landmark[2] - nx1) / float(bbox_size) | |||||
| offset_right_eye_y = (landmark[3] - ny1) / float(bbox_size) | |||||
| offset_nose_x = (landmark[4] - nx1) / float(bbox_size) | |||||
| offset_nose_y = (landmark[5] - ny1) / float(bbox_size) | |||||
| offset_left_mouth_x = (landmark[6] - nx1) / float(bbox_size) | |||||
| offset_left_mouth_y = (landmark[7] - ny1) / float(bbox_size) | |||||
| offset_right_mouth_x = (landmark[8] - nx1) / float(bbox_size) | |||||
| offset_right_mouth_y = (landmark[9] - ny1) / float(bbox_size) | |||||
| # cal iou | |||||
| iou = utils.IoU(crop_box.astype(np.float), np.expand_dims(gt_box.astype(np.float), 0)) | |||||
| if iou > 0.65: | |||||
| save_file = os.path.join(landmark_imgs_save_dir, "%s.jpg" % l_idx) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| f.write(save_file + ' -2 %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f \n' % \ | |||||
| (offset_x1, offset_y1, offset_x2, offset_y2, \ | |||||
| offset_left_eye_x,offset_left_eye_y,offset_right_eye_x,offset_right_eye_y,offset_nose_x,offset_nose_y,offset_left_mouth_x,offset_left_mouth_y,offset_right_mouth_x,offset_right_mouth_y)) | |||||
| l_idx += 1 | |||||
| f.close() | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Test mtcnn', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder', | |||||
| default='../data/wider/', type=str) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file', | |||||
| default='../data/wider/anno.txt', type=str) | |||||
| parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path', | |||||
| default='../data/', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| gen_data(args.annotation_file, args.dataset_path, args.prefix_path) | |||||
| @@ -0,0 +1,154 @@ | |||||
| # coding: utf-8 | |||||
| import os | |||||
| import cv2 | |||||
| import numpy as np | |||||
| import random | |||||
| import sys | |||||
| import numpy.random as npr | |||||
| import argparse | |||||
| import config | |||||
| import core.utils as utils | |||||
| def gen_data(anno_file, data_dir, prefix): | |||||
| size = 24 | |||||
| image_id = 0 | |||||
| landmark_imgs_save_dir = os.path.join(data_dir,"24/landmark") | |||||
| if not os.path.exists(landmark_imgs_save_dir): | |||||
| os.makedirs(landmark_imgs_save_dir) | |||||
| anno_dir = config.ANNO_STORE_DIR | |||||
| if not os.path.exists(anno_dir): | |||||
| os.makedirs(anno_dir) | |||||
| landmark_anno_filename = config.RNET_LANDMARK_ANNO_FILENAME | |||||
| save_landmark_anno = os.path.join(anno_dir,landmark_anno_filename) | |||||
| f = open(save_landmark_anno, 'w') | |||||
| # dstdir = "train_landmark_few" | |||||
| with open(anno_file, 'r') as f2: | |||||
| annotations = f2.readlines() | |||||
| num = len(annotations) | |||||
| print "%d total images" % num | |||||
| l_idx =0 | |||||
| idx = 0 | |||||
| # image_path bbox landmark(5*2) | |||||
| for annotation in annotations: | |||||
| # print imgPath | |||||
| annotation = annotation.strip().split(' ') | |||||
| assert len(annotation)==15,"each line should have 15 element" | |||||
| im_path = os.path.join(prefix,annotation[0].replace("\\", "/")) | |||||
| gt_box = map(float, annotation[1:5]) | |||||
| gt_box = [gt_box[0], gt_box[2], gt_box[1], gt_box[3]] | |||||
| gt_box = np.array(gt_box, dtype=np.int32) | |||||
| landmark = map(float, annotation[5:]) | |||||
| landmark = np.array(landmark, dtype=np.float) | |||||
| img = cv2.imread(im_path) | |||||
| assert (img is not None) | |||||
| height, width, channel = img.shape | |||||
| # crop_face = img[gt_box[1]:gt_box[3]+1, gt_box[0]:gt_box[2]+1] | |||||
| # crop_face = cv2.resize(crop_face,(size,size)) | |||||
| idx = idx + 1 | |||||
| if idx % 100 == 0: | |||||
| print "%d images done, landmark images: %d"%(idx,l_idx) | |||||
| x1, y1, x2, y2 = gt_box | |||||
| # gt's width | |||||
| w = x2 - x1 + 1 | |||||
| # gt's height | |||||
| h = y2 - y1 + 1 | |||||
| if max(w, h) < 40 or x1 < 0 or y1 < 0: | |||||
| continue | |||||
| # random shift | |||||
| for i in range(10): | |||||
| bbox_size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h))) | |||||
| delta_x = npr.randint(-w * 0.2, w * 0.2) | |||||
| delta_y = npr.randint(-h * 0.2, h * 0.2) | |||||
| nx1 = max(x1 + w / 2 - bbox_size / 2 + delta_x, 0) | |||||
| ny1 = max(y1 + h / 2 - bbox_size / 2 + delta_y, 0) | |||||
| nx2 = nx1 + bbox_size | |||||
| ny2 = ny1 + bbox_size | |||||
| if nx2 > width or ny2 > height: | |||||
| continue | |||||
| crop_box = np.array([nx1, ny1, nx2, ny2]) | |||||
| cropped_im = img[ny1:ny2 + 1, nx1:nx2 + 1, :] | |||||
| resized_im = cv2.resize(cropped_im, (size, size),interpolation=cv2.INTER_LINEAR) | |||||
| offset_x1 = (x1 - nx1) / float(bbox_size) | |||||
| offset_y1 = (y1 - ny1) / float(bbox_size) | |||||
| offset_x2 = (x2 - nx2) / float(bbox_size) | |||||
| offset_y2 = (y2 - ny2) / float(bbox_size) | |||||
| offset_left_eye_x = (landmark[0] - nx1) / float(bbox_size) | |||||
| offset_left_eye_y = (landmark[1] - ny1) / float(bbox_size) | |||||
| offset_right_eye_x = (landmark[2] - nx1) / float(bbox_size) | |||||
| offset_right_eye_y = (landmark[3] - ny1) / float(bbox_size) | |||||
| offset_nose_x = (landmark[4] - nx1) / float(bbox_size) | |||||
| offset_nose_y = (landmark[5] - ny1) / float(bbox_size) | |||||
| offset_left_mouth_x = (landmark[6] - nx1) / float(bbox_size) | |||||
| offset_left_mouth_y = (landmark[7] - ny1) / float(bbox_size) | |||||
| offset_right_mouth_x = (landmark[8] - nx1) / float(bbox_size) | |||||
| offset_right_mouth_y = (landmark[9] - ny1) / float(bbox_size) | |||||
| # cal iou | |||||
| iou = utils.IoU(crop_box.astype(np.float), np.expand_dims(gt_box.astype(np.float), 0)) | |||||
| if iou > 0.65: | |||||
| save_file = os.path.join(landmark_imgs_save_dir, "%s.jpg" % l_idx) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| f.write(save_file + ' -2 %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f \n' % \ | |||||
| (offset_x1, offset_y1, offset_x2, offset_y2, \ | |||||
| offset_left_eye_x,offset_left_eye_y,offset_right_eye_x,offset_right_eye_y,offset_nose_x,offset_nose_y,offset_left_mouth_x,offset_left_mouth_y,offset_right_mouth_x,offset_right_mouth_y)) | |||||
| l_idx += 1 | |||||
| f.close() | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Test mtcnn', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder', | |||||
| default='/idata/data/wider/', type=str) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file', | |||||
| default='/idata/data/trainImageList.txt', type=str) | |||||
| parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path', | |||||
| default='/idata/data', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| gen_data(args.annotation_file, args.dataset_path, args.prefix_path) | |||||
| @@ -0,0 +1,153 @@ | |||||
| # coding: utf-8 | |||||
| import os | |||||
| import cv2 | |||||
| import numpy as np | |||||
| import random | |||||
| import sys | |||||
| import numpy.random as npr | |||||
| import argparse | |||||
| import config | |||||
| import core.utils as utils | |||||
| def gen_data(anno_file, data_dir, prefix): | |||||
| size = 48 | |||||
| image_id = 0 | |||||
| landmark_imgs_save_dir = os.path.join(data_dir,"48/landmark") | |||||
| if not os.path.exists(landmark_imgs_save_dir): | |||||
| os.makedirs(landmark_imgs_save_dir) | |||||
| anno_dir = config.ANNO_STORE_DIR | |||||
| if not os.path.exists(anno_dir): | |||||
| os.makedirs(anno_dir) | |||||
| landmark_anno_filename = config.ONET_LANDMARK_ANNO_FILENAME | |||||
| save_landmark_anno = os.path.join(anno_dir,landmark_anno_filename) | |||||
| f = open(save_landmark_anno, 'w') | |||||
| # dstdir = "train_landmark_few" | |||||
| with open(anno_file, 'r') as f2: | |||||
| annotations = f2.readlines() | |||||
| num = len(annotations) | |||||
| print "%d total images" % num | |||||
| l_idx =0 | |||||
| idx = 0 | |||||
| # image_path bbox landmark(5*2) | |||||
| for annotation in annotations: | |||||
| # print imgPath | |||||
| annotation = annotation.strip().split(' ') | |||||
| assert len(annotation)==15,"each line should have 15 element" | |||||
| im_path = os.path.join(prefix,annotation[0].replace("\\", "/")) | |||||
| gt_box = map(float, annotation[1:5]) | |||||
| # gt_box = [gt_box[0], gt_box[2], gt_box[1], gt_box[3]] | |||||
| gt_box = np.array(gt_box, dtype=np.int32) | |||||
| landmark = map(float, annotation[5:]) | |||||
| landmark = np.array(landmark, dtype=np.float) | |||||
| img = cv2.imread(im_path) | |||||
| assert (img is not None) | |||||
| height, width, channel = img.shape | |||||
| # crop_face = img[gt_box[1]:gt_box[3]+1, gt_box[0]:gt_box[2]+1] | |||||
| # crop_face = cv2.resize(crop_face,(size,size)) | |||||
| idx = idx + 1 | |||||
| if idx % 100 == 0: | |||||
| print "%d images done, landmark images: %d"%(idx,l_idx) | |||||
| x1, y1, x2, y2 = gt_box | |||||
| # gt's width | |||||
| w = x2 - x1 + 1 | |||||
| # gt's height | |||||
| h = y2 - y1 + 1 | |||||
| if max(w, h) < 40 or x1 < 0 or y1 < 0: | |||||
| continue | |||||
| # random shift | |||||
| for i in range(10): | |||||
| bbox_size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h))) | |||||
| delta_x = npr.randint(-w * 0.2, w * 0.2) | |||||
| delta_y = npr.randint(-h * 0.2, h * 0.2) | |||||
| nx1 = max(x1 + w / 2 - bbox_size / 2 + delta_x, 0) | |||||
| ny1 = max(y1 + h / 2 - bbox_size / 2 + delta_y, 0) | |||||
| nx2 = nx1 + bbox_size | |||||
| ny2 = ny1 + bbox_size | |||||
| if nx2 > width or ny2 > height: | |||||
| continue | |||||
| crop_box = np.array([nx1, ny1, nx2, ny2]) | |||||
| cropped_im = img[ny1:ny2 + 1, nx1:nx2 + 1, :] | |||||
| resized_im = cv2.resize(cropped_im, (size, size),interpolation=cv2.INTER_LINEAR) | |||||
| offset_x1 = (x1 - nx1) / float(bbox_size) | |||||
| offset_y1 = (y1 - ny1) / float(bbox_size) | |||||
| offset_x2 = (x2 - nx2) / float(bbox_size) | |||||
| offset_y2 = (y2 - ny2) / float(bbox_size) | |||||
| offset_left_eye_x = (landmark[0] - nx1) / float(bbox_size) | |||||
| offset_left_eye_y = (landmark[1] - ny1) / float(bbox_size) | |||||
| offset_right_eye_x = (landmark[2] - nx1) / float(bbox_size) | |||||
| offset_right_eye_y = (landmark[3] - ny1) / float(bbox_size) | |||||
| offset_nose_x = (landmark[4] - nx1) / float(bbox_size) | |||||
| offset_nose_y = (landmark[5] - ny1) / float(bbox_size) | |||||
| offset_left_mouth_x = (landmark[6] - nx1) / float(bbox_size) | |||||
| offset_left_mouth_y = (landmark[7] - ny1) / float(bbox_size) | |||||
| offset_right_mouth_x = (landmark[8] - nx1) / float(bbox_size) | |||||
| offset_right_mouth_y = (landmark[9] - ny1) / float(bbox_size) | |||||
| # cal iou | |||||
| iou = utils.IoU(crop_box.astype(np.float), np.expand_dims(gt_box.astype(np.float), 0)) | |||||
| if iou > 0.65: | |||||
| save_file = os.path.join(landmark_imgs_save_dir, "%s.jpg" % l_idx) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| f.write(save_file + ' -2 %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f \n' % \ | |||||
| (offset_x1, offset_y1, offset_x2, offset_y2, \ | |||||
| offset_left_eye_x,offset_left_eye_y,offset_right_eye_x,offset_right_eye_y,offset_nose_x,offset_nose_y,offset_left_mouth_x,offset_left_mouth_y,offset_right_mouth_x,offset_right_mouth_y)) | |||||
| l_idx += 1 | |||||
| f.close() | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Test mtcnn', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder', | |||||
| default='/idata/data/wider/', type=str) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file', | |||||
| default='/idata/data/trainImageList.txt', type=str) | |||||
| parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path', | |||||
| default='/idata/data', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| gen_data(args.annotation_file, args.dataset_path, args.prefix_path) | |||||
| @@ -0,0 +1,234 @@ | |||||
| import argparse | |||||
| import cv2 | |||||
| import numpy as np | |||||
| from core.detect import MtcnnDetector,create_mtcnn_net | |||||
| from core.imagedb import ImageDB | |||||
| from core.image_reader import TestImageLoader | |||||
| import time | |||||
| import os | |||||
| import cPickle | |||||
| from core.utils import convert_to_square,IoU | |||||
| import config | |||||
| import core.vision as vision | |||||
| def gen_landmark48_data(data_dir, anno_file, pnet_model_file, rnet_model_file, prefix_path='', use_cuda=True, vis=False): | |||||
| pnet, rnet, _ = create_mtcnn_net(p_model_path=pnet_model_file, r_model_path=rnet_model_file, use_cuda=use_cuda) | |||||
| mtcnn_detector = MtcnnDetector(pnet=pnet, rnet=rnet, min_face_size=12) | |||||
| imagedb = ImageDB(anno_file,mode="test",prefix_path=prefix_path) | |||||
| imdb = imagedb.load_imdb() | |||||
| image_reader = TestImageLoader(imdb,1,False) | |||||
| all_boxes = list() | |||||
| batch_idx = 0 | |||||
| for databatch in image_reader: | |||||
| if batch_idx % 100 == 0: | |||||
| print "%d images done" % batch_idx | |||||
| im = databatch | |||||
| if im.shape[0] >= 1200 or im.shape[1] >=1200: | |||||
| all_boxes.append(np.array([])) | |||||
| batch_idx += 1 | |||||
| continue | |||||
| t = time.time() | |||||
| p_boxes, p_boxes_align = mtcnn_detector.detect_pnet(im=im) | |||||
| boxes, boxes_align = mtcnn_detector.detect_rnet(im=im, dets=p_boxes_align) | |||||
| if boxes_align is None: | |||||
| all_boxes.append(np.array([])) | |||||
| batch_idx += 1 | |||||
| continue | |||||
| if vis: | |||||
| rgb_im = cv2.cvtColor(np.asarray(im), cv2.COLOR_BGR2RGB) | |||||
| vision.vis_two(rgb_im, boxes, boxes_align) | |||||
| t1 = time.time() - t | |||||
| t = time.time() | |||||
| all_boxes.append(boxes_align) | |||||
| batch_idx += 1 | |||||
| save_path = config.MODEL_STORE_DIR | |||||
| if not os.path.exists(save_path): | |||||
| os.mkdir(save_path) | |||||
| save_file = os.path.join(save_path, "detections_%d.pkl" % int(time.time())) | |||||
| with open(save_file, 'wb') as f: | |||||
| cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL) | |||||
| gen_sample_data(data_dir,anno_file,save_file, prefix_path) | |||||
| def gen_sample_data(data_dir, anno_file, det_boxs_file, prefix_path =''): | |||||
| landmark_save_dir = os.path.join(data_dir, "48/landmark") | |||||
| if not os.path.exists(landmark_save_dir): | |||||
| os.makedirs(landmark_save_dir) | |||||
| # load ground truth from annotation file | |||||
| # format of each line: image/path [x1,y1,x2,y2] for each gt_box in this image | |||||
| with open(anno_file, 'r') as f: | |||||
| annotations = f.readlines() | |||||
| image_size = 48 | |||||
| net = "onet" | |||||
| im_idx_list = list() | |||||
| gt_boxes_list = list() | |||||
| gt_landmark_list = list() | |||||
| num_of_images = len(annotations) | |||||
| print "processing %d images in total" % num_of_images | |||||
| for annotation in annotations: | |||||
| annotation = annotation.strip().split(' ') | |||||
| im_idx = annotation[0] | |||||
| boxes = map(float, annotation[1:5]) | |||||
| boxes = np.array(boxes, dtype=np.float32).reshape(-1, 4) | |||||
| landmarks = map(float, annotation[5:]) | |||||
| landmarks = np.array(landmarks, dtype=np.float32).reshape(-1, 10) | |||||
| im_idx_list.append(im_idx) | |||||
| gt_boxes_list.append(boxes) | |||||
| gt_landmark_list.append(landmarks) | |||||
| save_path = config.ANNO_STORE_DIR | |||||
| if not os.path.exists(save_path): | |||||
| os.makedirs(save_path) | |||||
| f = open(os.path.join(save_path, 'landmark_48.txt'), 'w') | |||||
| det_handle = open(det_boxs_file, 'r') | |||||
| det_boxes = cPickle.load(det_handle) | |||||
| print len(det_boxes), num_of_images | |||||
| assert len(det_boxes) == num_of_images, "incorrect detections or ground truths" | |||||
| # index of neg, pos and part face, used as their image names | |||||
| p_idx = 0 | |||||
| image_done = 0 | |||||
| for im_idx, dets, gts, landmark in zip(im_idx_list, det_boxes, gt_boxes_list, gt_landmark_list): | |||||
| if image_done % 100 == 0: | |||||
| print "%d images done" % image_done | |||||
| image_done += 1 | |||||
| if dets.shape[0] == 0: | |||||
| continue | |||||
| img = cv2.imread(os.path.join(prefix_path,im_idx)) | |||||
| dets = convert_to_square(dets) | |||||
| dets[:, 0:4] = np.round(dets[:, 0:4]) | |||||
| for box in dets: | |||||
| x_left, y_top, x_right, y_bottom = box[0:4].astype(int) | |||||
| width = x_right - x_left + 1 | |||||
| height = y_bottom - y_top + 1 | |||||
| # ignore box that is too small or beyond image border | |||||
| if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1: | |||||
| continue | |||||
| # compute intersection over union(IoU) between current box and all gt boxes | |||||
| Iou = IoU(box, gts) | |||||
| cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :] | |||||
| resized_im = cv2.resize(cropped_im, (image_size, image_size), | |||||
| interpolation=cv2.INTER_LINEAR) | |||||
| # save negative images and write label | |||||
| if np.max(Iou) < 0.3: | |||||
| # Iou with all gts must below 0.3 | |||||
| continue | |||||
| else: | |||||
| # find gt_box with the highest iou | |||||
| idx = np.argmax(Iou) | |||||
| assigned_gt = gts[idx] | |||||
| x1, y1, x2, y2 = assigned_gt | |||||
| # compute bbox reg label | |||||
| offset_x1 = (x1 - x_left) / float(width) | |||||
| offset_y1 = (y1 - y_top) / float(height) | |||||
| offset_x2 = (x2 - x_right) / float(width) | |||||
| offset_y2 = (y2 - y_bottom) / float(height) | |||||
| offset_left_eye_x = (landmark[0,0] - x_left) / float(width) | |||||
| offset_left_eye_y = (landmark[0,1] - y_top) / float(height) | |||||
| offset_right_eye_x = (landmark[0,2] - x_left) / float(width) | |||||
| offset_right_eye_y = (landmark[0,3] - y_top) / float(height) | |||||
| offset_nose_x = (landmark[0,4] - x_left) / float(width) | |||||
| offset_nose_y = (landmark[0,5] - y_top) / float(height) | |||||
| offset_left_mouth_x = (landmark[0,6] - x_left) / float(width) | |||||
| offset_left_mouth_y = (landmark[0,7] - y_top) / float(height) | |||||
| offset_right_mouth_x = (landmark[0,8] - x_left) / float(width) | |||||
| offset_right_mouth_y = (landmark[0,9] - y_top) / float(height) | |||||
| # save positive and part-face images and write labels | |||||
| if np.max(Iou) >= 0.65: | |||||
| save_file = os.path.join(landmark_save_dir, "%s.jpg" % p_idx) | |||||
| f.write(save_file + ' -2 %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f \n' % \ | |||||
| (offset_x1, offset_y1, offset_x2, offset_y2, \ | |||||
| offset_left_eye_x, offset_left_eye_y, offset_right_eye_x, offset_right_eye_y, | |||||
| offset_nose_x, offset_nose_y, offset_left_mouth_x, offset_left_mouth_y, | |||||
| offset_right_mouth_x, offset_right_mouth_y)) | |||||
| cv2.imwrite(save_file, resized_im) | |||||
| p_idx += 1 | |||||
| f.close() | |||||
| def model_store_path(): | |||||
| return os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))+"/model_store" | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Test mtcnn', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder', | |||||
| default='../data/wider/', type=str) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', help='output data folder', | |||||
| default='../data/wider/anno.txt', type=str) | |||||
| parser.add_argument('--pmodel_file', dest='pnet_model_file', help='PNet model file path', | |||||
| default='/idata/workspace/mtcnn/model_store/pnet_epoch_5best.pt', type=str) | |||||
| parser.add_argument('--rmodel_file', dest='rnet_model_file', help='RNet model file path', | |||||
| default='/idata/workspace/mtcnn/model_store/rnet_epoch_1.pt', type=str) | |||||
| parser.add_argument('--gpu', dest='use_cuda', help='with gpu', | |||||
| default=config.USE_CUDA, type=bool) | |||||
| parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path', | |||||
| default='', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| gen_landmark48_data(args.dataset_path, args.annotation_file, args.pnet_model_file, args.rnet_model_file, args.prefix_path, args.use_cuda) | |||||
| @@ -0,0 +1,281 @@ | |||||
| from core.image_reader import TrainImageReader | |||||
| import datetime | |||||
| import os | |||||
| from core.models import PNet,RNet,ONet,LossFn | |||||
| import torch | |||||
| from torch.autograd import Variable | |||||
| import core.image_tools as image_tools | |||||
| def compute_accuracy(prob_cls, gt_cls): | |||||
| prob_cls = torch.squeeze(prob_cls) | |||||
| gt_cls = torch.squeeze(gt_cls) | |||||
| #we only need the detection which >= 0 | |||||
| mask = torch.ge(gt_cls,0) | |||||
| #get valid element | |||||
| valid_gt_cls = torch.masked_select(gt_cls,mask) | |||||
| valid_prob_cls = torch.masked_select(prob_cls,mask) | |||||
| size = min(valid_gt_cls.size()[0], valid_prob_cls.size()[0]) | |||||
| prob_ones = torch.ge(valid_prob_cls,0.6).float() | |||||
| right_ones = torch.eq(prob_ones,valid_gt_cls).float() | |||||
| return torch.div(torch.mul(torch.sum(right_ones),float(1.0)),float(size)) | |||||
| def train_pnet(model_store_path, end_epoch,imdb, | |||||
| batch_size,frequent=50,base_lr=0.01,use_cuda=True): | |||||
| if not os.path.exists(model_store_path): | |||||
| os.makedirs(model_store_path) | |||||
| lossfn = LossFn() | |||||
| net = PNet(is_train=True, use_cuda=use_cuda) | |||||
| net.train() | |||||
| if use_cuda: | |||||
| net.cuda() | |||||
| optimizer = torch.optim.Adam(net.parameters(), lr=base_lr) | |||||
| train_data=TrainImageReader(imdb,12,batch_size,shuffle=True) | |||||
| for cur_epoch in range(1,end_epoch+1): | |||||
| train_data.reset() | |||||
| accuracy_list=[] | |||||
| cls_loss_list=[] | |||||
| bbox_loss_list=[] | |||||
| # landmark_loss_list=[] | |||||
| for batch_idx,(image,(gt_label,gt_bbox,gt_landmark))in enumerate(train_data): | |||||
| im_tensor = [ image_tools.convert_image_to_tensor(image[i,:,:,:]) for i in range(image.shape[0]) ] | |||||
| im_tensor = torch.stack(im_tensor) | |||||
| im_tensor = Variable(im_tensor) | |||||
| gt_label = Variable(torch.from_numpy(gt_label).float()) | |||||
| gt_bbox = Variable(torch.from_numpy(gt_bbox).float()) | |||||
| # gt_landmark = Variable(torch.from_numpy(gt_landmark).float()) | |||||
| if use_cuda: | |||||
| im_tensor = im_tensor.cuda() | |||||
| gt_label = gt_label.cuda() | |||||
| gt_bbox = gt_bbox.cuda() | |||||
| # gt_landmark = gt_landmark.cuda() | |||||
| cls_pred, box_offset_pred = net(im_tensor) | |||||
| # all_loss, cls_loss, offset_loss = lossfn.loss(gt_label=label_y,gt_offset=bbox_y, pred_label=cls_pred, pred_offset=box_offset_pred) | |||||
| cls_loss = lossfn.cls_loss(gt_label,cls_pred) | |||||
| box_offset_loss = lossfn.box_loss(gt_label,gt_bbox,box_offset_pred) | |||||
| # landmark_loss = lossfn.landmark_loss(gt_label,gt_landmark,landmark_offset_pred) | |||||
| all_loss = cls_loss*1.0+box_offset_loss*0.5 | |||||
| if batch_idx%frequent==0: | |||||
| accuracy=compute_accuracy(cls_pred,gt_label) | |||||
| show1 = accuracy.data.tolist()[0] | |||||
| show2 = cls_loss.data.tolist()[0] | |||||
| show3 = box_offset_loss.data.tolist()[0] | |||||
| show5 = all_loss.data.tolist()[0] | |||||
| print "%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(),cur_epoch,batch_idx, show1,show2,show3,show5,base_lr) | |||||
| accuracy_list.append(accuracy) | |||||
| cls_loss_list.append(cls_loss) | |||||
| bbox_loss_list.append(box_offset_loss) | |||||
| optimizer.zero_grad() | |||||
| all_loss.backward() | |||||
| optimizer.step() | |||||
| accuracy_avg = torch.mean(torch.cat(accuracy_list)) | |||||
| cls_loss_avg = torch.mean(torch.cat(cls_loss_list)) | |||||
| bbox_loss_avg = torch.mean(torch.cat(bbox_loss_list)) | |||||
| # landmark_loss_avg = torch.mean(torch.cat(landmark_loss_list)) | |||||
| show6 = accuracy_avg.data.tolist()[0] | |||||
| show7 = cls_loss_avg.data.tolist()[0] | |||||
| show8 = bbox_loss_avg.data.tolist()[0] | |||||
| print "Epoch: %d, accuracy: %s, cls loss: %s, bbox loss: %s" % (cur_epoch, show6, show7, show8) | |||||
| torch.save(net.state_dict(), os.path.join(model_store_path,"pnet_epoch_%d.pt" % cur_epoch)) | |||||
| torch.save(net, os.path.join(model_store_path,"pnet_epoch_model_%d.pkl" % cur_epoch)) | |||||
| def train_rnet(model_store_path, end_epoch,imdb, | |||||
| batch_size,frequent=50,base_lr=0.01,use_cuda=True): | |||||
| if not os.path.exists(model_store_path): | |||||
| os.makedirs(model_store_path) | |||||
| lossfn = LossFn() | |||||
| net = RNet(is_train=True, use_cuda=use_cuda) | |||||
| net.train() | |||||
| if use_cuda: | |||||
| net.cuda() | |||||
| optimizer = torch.optim.Adam(net.parameters(), lr=base_lr) | |||||
| train_data=TrainImageReader(imdb,24,batch_size,shuffle=True) | |||||
| for cur_epoch in range(1,end_epoch+1): | |||||
| train_data.reset() | |||||
| accuracy_list=[] | |||||
| cls_loss_list=[] | |||||
| bbox_loss_list=[] | |||||
| landmark_loss_list=[] | |||||
| for batch_idx,(image,(gt_label,gt_bbox,gt_landmark))in enumerate(train_data): | |||||
| im_tensor = [ image_tools.convert_image_to_tensor(image[i,:,:,:]) for i in range(image.shape[0]) ] | |||||
| im_tensor = torch.stack(im_tensor) | |||||
| im_tensor = Variable(im_tensor) | |||||
| gt_label = Variable(torch.from_numpy(gt_label).float()) | |||||
| gt_bbox = Variable(torch.from_numpy(gt_bbox).float()) | |||||
| gt_landmark = Variable(torch.from_numpy(gt_landmark).float()) | |||||
| if use_cuda: | |||||
| im_tensor = im_tensor.cuda() | |||||
| gt_label = gt_label.cuda() | |||||
| gt_bbox = gt_bbox.cuda() | |||||
| gt_landmark = gt_landmark.cuda() | |||||
| cls_pred, box_offset_pred = net(im_tensor) | |||||
| # all_loss, cls_loss, offset_loss = lossfn.loss(gt_label=label_y,gt_offset=bbox_y, pred_label=cls_pred, pred_offset=box_offset_pred) | |||||
| cls_loss = lossfn.cls_loss(gt_label,cls_pred) | |||||
| box_offset_loss = lossfn.box_loss(gt_label,gt_bbox,box_offset_pred) | |||||
| # landmark_loss = lossfn.landmark_loss(gt_label,gt_landmark,landmark_offset_pred) | |||||
| all_loss = cls_loss*1.0+box_offset_loss*0.5 | |||||
| if batch_idx%frequent==0: | |||||
| accuracy=compute_accuracy(cls_pred,gt_label) | |||||
| show1 = accuracy.data.tolist()[0] | |||||
| show2 = cls_loss.data.tolist()[0] | |||||
| show3 = box_offset_loss.data.tolist()[0] | |||||
| # show4 = landmark_loss.data.tolist()[0] | |||||
| show5 = all_loss.data.tolist()[0] | |||||
| print "%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(), cur_epoch, batch_idx, show1, show2, show3, show5, base_lr) | |||||
| accuracy_list.append(accuracy) | |||||
| cls_loss_list.append(cls_loss) | |||||
| bbox_loss_list.append(box_offset_loss) | |||||
| # landmark_loss_list.append(landmark_loss) | |||||
| optimizer.zero_grad() | |||||
| all_loss.backward() | |||||
| optimizer.step() | |||||
| accuracy_avg = torch.mean(torch.cat(accuracy_list)) | |||||
| cls_loss_avg = torch.mean(torch.cat(cls_loss_list)) | |||||
| bbox_loss_avg = torch.mean(torch.cat(bbox_loss_list)) | |||||
| # landmark_loss_avg = torch.mean(torch.cat(landmark_loss_list)) | |||||
| show6 = accuracy_avg.data.tolist()[0] | |||||
| show7 = cls_loss_avg.data.tolist()[0] | |||||
| show8 = bbox_loss_avg.data.tolist()[0] | |||||
| # show9 = landmark_loss_avg.data.tolist()[0] | |||||
| print "Epoch: %d, accuracy: %s, cls loss: %s, bbox loss: %s" % (cur_epoch, show6, show7, show8) | |||||
| torch.save(net.state_dict(), os.path.join(model_store_path,"rnet_epoch_%d.pt" % cur_epoch)) | |||||
| torch.save(net, os.path.join(model_store_path,"rnet_epoch_model_%d.pkl" % cur_epoch)) | |||||
| def train_onet(model_store_path, end_epoch,imdb, | |||||
| batch_size,frequent=50,base_lr=0.01,use_cuda=True): | |||||
| if not os.path.exists(model_store_path): | |||||
| os.makedirs(model_store_path) | |||||
| lossfn = LossFn() | |||||
| net = ONet(is_train=True) | |||||
| net.train() | |||||
| if use_cuda: | |||||
| net.cuda() | |||||
| optimizer = torch.optim.Adam(net.parameters(), lr=base_lr) | |||||
| train_data=TrainImageReader(imdb,48,batch_size,shuffle=True) | |||||
| for cur_epoch in range(1,end_epoch+1): | |||||
| train_data.reset() | |||||
| accuracy_list=[] | |||||
| cls_loss_list=[] | |||||
| bbox_loss_list=[] | |||||
| landmark_loss_list=[] | |||||
| for batch_idx,(image,(gt_label,gt_bbox,gt_landmark))in enumerate(train_data): | |||||
| im_tensor = [ image_tools.convert_image_to_tensor(image[i,:,:,:]) for i in range(image.shape[0]) ] | |||||
| im_tensor = torch.stack(im_tensor) | |||||
| im_tensor = Variable(im_tensor) | |||||
| gt_label = Variable(torch.from_numpy(gt_label).float()) | |||||
| gt_bbox = Variable(torch.from_numpy(gt_bbox).float()) | |||||
| gt_landmark = Variable(torch.from_numpy(gt_landmark).float()) | |||||
| if use_cuda: | |||||
| im_tensor = im_tensor.cuda() | |||||
| gt_label = gt_label.cuda() | |||||
| gt_bbox = gt_bbox.cuda() | |||||
| gt_landmark = gt_landmark.cuda() | |||||
| cls_pred, box_offset_pred, landmark_offset_pred = net(im_tensor) | |||||
| # all_loss, cls_loss, offset_loss = lossfn.loss(gt_label=label_y,gt_offset=bbox_y, pred_label=cls_pred, pred_offset=box_offset_pred) | |||||
| cls_loss = lossfn.cls_loss(gt_label,cls_pred) | |||||
| box_offset_loss = lossfn.box_loss(gt_label,gt_bbox,box_offset_pred) | |||||
| landmark_loss = lossfn.landmark_loss(gt_label,gt_landmark,landmark_offset_pred) | |||||
| all_loss = cls_loss*0.8+box_offset_loss*0.6+landmark_loss*1.5 | |||||
| if batch_idx%frequent==0: | |||||
| accuracy=compute_accuracy(cls_pred,gt_label) | |||||
| show1 = accuracy.data.tolist()[0] | |||||
| show2 = cls_loss.data.tolist()[0] | |||||
| show3 = box_offset_loss.data.tolist()[0] | |||||
| show4 = landmark_loss.data.tolist()[0] | |||||
| show5 = all_loss.data.tolist()[0] | |||||
| print "%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, landmark loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(),cur_epoch,batch_idx, show1,show2,show3,show4,show5,base_lr) | |||||
| accuracy_list.append(accuracy) | |||||
| cls_loss_list.append(cls_loss) | |||||
| bbox_loss_list.append(box_offset_loss) | |||||
| landmark_loss_list.append(landmark_loss) | |||||
| optimizer.zero_grad() | |||||
| all_loss.backward() | |||||
| optimizer.step() | |||||
| accuracy_avg = torch.mean(torch.cat(accuracy_list)) | |||||
| cls_loss_avg = torch.mean(torch.cat(cls_loss_list)) | |||||
| bbox_loss_avg = torch.mean(torch.cat(bbox_loss_list)) | |||||
| landmark_loss_avg = torch.mean(torch.cat(landmark_loss_list)) | |||||
| show6 = accuracy_avg.data.tolist()[0] | |||||
| show7 = cls_loss_avg.data.tolist()[0] | |||||
| show8 = bbox_loss_avg.data.tolist()[0] | |||||
| show9 = landmark_loss_avg.data.tolist()[0] | |||||
| print "Epoch: %d, accuracy: %s, cls loss: %s, bbox loss: %s, landmark loss: %s " % (cur_epoch, show6, show7, show8, show9) | |||||
| torch.save(net.state_dict(), os.path.join(model_store_path,"onet_epoch_%d.pt" % cur_epoch)) | |||||
| torch.save(net, os.path.join(model_store_path,"onet_epoch_model_%d.pkl" % cur_epoch)) | |||||
| @@ -0,0 +1,50 @@ | |||||
| import argparse | |||||
| import sys | |||||
| from core.imagedb import ImageDB | |||||
| import train as train | |||||
| import config | |||||
| import os | |||||
| def train_net(annotation_file, model_store_path, | |||||
| end_epoch=16, frequent=200, lr=0.01, batch_size=128, use_cuda=False): | |||||
| imagedb = ImageDB(annotation_file) | |||||
| gt_imdb = imagedb.load_imdb() | |||||
| gt_imdb = imagedb.append_flipped_images(gt_imdb) | |||||
| train.train_onet(model_store_path=model_store_path, end_epoch=end_epoch, imdb=gt_imdb, batch_size=batch_size, frequent=frequent, base_lr=lr, use_cuda=use_cuda) | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Train ONet', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', | |||||
| default=os.path.join(config.ANNO_STORE_DIR,config.ONET_TRAIN_IMGLIST_FILENAME), help='training data annotation file', type=str) | |||||
| parser.add_argument('--model_path', dest='model_store_path', help='training model store directory', | |||||
| default=config.MODEL_STORE_DIR, type=str) | |||||
| parser.add_argument('--end_epoch', dest='end_epoch', help='end epoch of training', | |||||
| default=config.END_EPOCH, type=int) | |||||
| parser.add_argument('--frequent', dest='frequent', help='frequency of logging', | |||||
| default=200, type=int) | |||||
| parser.add_argument('--lr', dest='lr', help='learning rate', | |||||
| default=0.002, type=float) | |||||
| parser.add_argument('--batch_size', dest='batch_size', help='train batch size', | |||||
| default=1000, type=int) | |||||
| parser.add_argument('--gpu', dest='use_cuda', help='train with gpu', | |||||
| default=config.USE_CUDA, type=bool) | |||||
| parser.add_argument('--prefix_path', dest='', help='training data annotation images prefix root path', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| print 'train ONet argument:' | |||||
| print args | |||||
| train_net(annotation_file=args.annotation_file, model_store_path=args.model_store_path, | |||||
| end_epoch=args.end_epoch, frequent=args.frequent, lr=args.lr, batch_size=args.batch_size, use_cuda=args.use_cuda) | |||||
| @@ -0,0 +1,49 @@ | |||||
| import argparse | |||||
| import sys | |||||
| from core.imagedb import ImageDB | |||||
| from train import train_pnet | |||||
| import config | |||||
| import os | |||||
| def train_net(annotation_file, model_store_path, | |||||
| end_epoch=16, frequent=200, lr=0.01, batch_size=128, use_cuda=False): | |||||
| imagedb = ImageDB(annotation_file) | |||||
| gt_imdb = imagedb.load_imdb() | |||||
| gt_imdb = imagedb.append_flipped_images(gt_imdb) | |||||
| train_pnet(model_store_path=model_store_path, end_epoch=end_epoch, imdb=gt_imdb, batch_size=batch_size, frequent=frequent, base_lr=lr, use_cuda=use_cuda) | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Train PNet', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', | |||||
| default=os.path.join(config.ANNO_STORE_DIR,config.PNET_TRAIN_IMGLIST_FILENAME), help='training data annotation file', type=str) | |||||
| parser.add_argument('--model_path', dest='model_store_path', help='training model store directory', | |||||
| default=config.MODEL_STORE_DIR, type=str) | |||||
| parser.add_argument('--end_epoch', dest='end_epoch', help='end epoch of training', | |||||
| default=config.END_EPOCH, type=int) | |||||
| parser.add_argument('--frequent', dest='frequent', help='frequency of logging', | |||||
| default=200, type=int) | |||||
| parser.add_argument('--lr', dest='lr', help='learning rate', | |||||
| default=config.TRAIN_LR, type=float) | |||||
| parser.add_argument('--batch_size', dest='batch_size', help='train batch size', | |||||
| default=config.TRAIN_BATCH_SIZE, type=int) | |||||
| parser.add_argument('--gpu', dest='use_cuda', help='train with gpu', | |||||
| default=config.USE_CUDA, type=bool) | |||||
| parser.add_argument('--prefix_path', dest='', help='training data annotation images prefix root path', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| print 'train Pnet argument:' | |||||
| print args | |||||
| train_net(annotation_file=args.annotation_file, model_store_path=args.model_store_path, | |||||
| end_epoch=args.end_epoch, frequent=args.frequent, lr=args.lr, batch_size=args.batch_size, use_cuda=args.use_cuda) | |||||
| @@ -0,0 +1,50 @@ | |||||
| import argparse | |||||
| import sys | |||||
| from core.imagedb import ImageDB | |||||
| import train as train | |||||
| import config | |||||
| import os | |||||
| def train_net(annotation_file, model_store_path, | |||||
| end_epoch=16, frequent=200, lr=0.01, batch_size=128, use_cuda=False): | |||||
| imagedb = ImageDB(annotation_file) | |||||
| gt_imdb = imagedb.load_imdb() | |||||
| gt_imdb = imagedb.append_flipped_images(gt_imdb) | |||||
| train.train_rnet(model_store_path=model_store_path, end_epoch=end_epoch, imdb=gt_imdb, batch_size=batch_size, frequent=frequent, base_lr=lr, use_cuda=use_cuda) | |||||
| def parse_args(): | |||||
| parser = argparse.ArgumentParser(description='Train RNet', | |||||
| formatter_class=argparse.ArgumentDefaultsHelpFormatter) | |||||
| parser.add_argument('--anno_file', dest='annotation_file', | |||||
| default=os.path.join(config.ANNO_STORE_DIR,config.RNET_TRAIN_IMGLIST_FILENAME), help='training data annotation file', type=str) | |||||
| parser.add_argument('--model_path', dest='model_store_path', help='training model store directory', | |||||
| default=config.MODEL_STORE_DIR, type=str) | |||||
| parser.add_argument('--end_epoch', dest='end_epoch', help='end epoch of training', | |||||
| default=config.END_EPOCH, type=int) | |||||
| parser.add_argument('--frequent', dest='frequent', help='frequency of logging', | |||||
| default=200, type=int) | |||||
| parser.add_argument('--lr', dest='lr', help='learning rate', | |||||
| default=config.TRAIN_LR, type=float) | |||||
| parser.add_argument('--batch_size', dest='batch_size', help='train batch size', | |||||
| default=config.TRAIN_BATCH_SIZE, type=int) | |||||
| parser.add_argument('--gpu', dest='use_cuda', help='train with gpu', | |||||
| default=config.USE_CUDA, type=bool) | |||||
| parser.add_argument('--prefix_path', dest='', help='training data annotation images prefix root path', type=str) | |||||
| args = parser.parse_args() | |||||
| return args | |||||
| if __name__ == '__main__': | |||||
| args = parse_args() | |||||
| print 'train Rnet argument:' | |||||
| print args | |||||
| train_net(annotation_file=args.annotation_file, model_store_path=args.model_store_path, | |||||
| end_epoch=args.end_epoch, frequent=args.frequent, lr=args.lr, batch_size=args.batch_size, use_cuda=args.use_cuda) | |||||
| @@ -0,0 +1,20 @@ | |||||
| import cv2 | |||||
| from core.detect import create_mtcnn_net, MtcnnDetector | |||||
| import core.vision as vision | |||||
| if __name__ == '__main__': | |||||
| pnet, rnet, onet = create_mtcnn_net(p_model_path="./model_store/pnet_epoch_5best.pt", r_model_path="./model_store/rnet_epoch_1.pt", o_model_path="./model_store/onet_epoch_7bbest.pt", use_cuda=True) | |||||
| mtcnn_detector = MtcnnDetector(pnet=pnet, rnet=rnet, onet=onet, min_face_size=24) | |||||
| img = cv2.imread("./test.jpg") | |||||
| b, g, r = cv2.split(img) | |||||
| img2 = cv2.merge([r, g, b]) | |||||
| bboxs, landmarks = mtcnn_detector.detect_face(img) | |||||
| # print box_align | |||||
| vision.vis_face(img2,bboxs,landmarks) | |||||