1.DFace star

8 years ago · a2eb45b199
--- a/README.md
+++ b/README.md
@@ -0,0 +1,137 @@
 <div align=center>
 <img src="http://affluent.oss-cn-hangzhou.aliyuncs.com/html/images/dface_logo.png" width="350">
 </div>
 -----------------
 # DFace • [![License](http://pic.dface.io/apache2.svg)](https://opensource.org/licenses/Apache-2.0) [![gitter](http://pic.dface.io/gitee.svg)](https://gitter.im/cmusatyalab/DFace)
 | **`Linux CPU`** | **`Linux GPU`** | **`Mac OS CPU`** | **`Windows CPU`** |
 |-----------------|---------------------|------------------|-------------------|
 | [![Build Status](http://pic.dface.io/pass.svg)](http://pic.dface.io/pass.svg) | [![Build Status](http://pic.dface.io/pass.svg)](http://pic.dface.io/pass.svg) | [![Build Status](http://pic.dface.io/pass.svg)](http://pic.dface.io/pass.svg) | [![Build Status](http://pic.dface.io/pass.svg)](http://pic.dface.io/pass.svg) |
 **基于多任务卷积网络(MTCNN)和Center-Loss的多人实时人脸检测和人脸识别系统。**
 **DFace** 是个开源的深度学习人脸检测和人脸识别系统。所有功能都采用　**[pytorch](https://github.com/pytorch/pytorch)**　框架开发。pytorch是一个由facebook开发的深度学习框架，它包含了一些比较有趣的高级特性，例如自动求导，动态构图等。DFace天然的继承了这些优点，使得它的训练过程可以更加简单方便，并且实现的代码可以更加清晰易懂。
 DFace可以利用CUDA来支持GPU加速模式。我们建议尝试linux GPU这种模式，它几乎可以实现实时的效果。
 所有的灵感都来源于学术界最近的一些研究成果，例如　[Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks](https://arxiv.org/abs/1604.02878) 和 [FaceNet: A Unified Embedding for Face Recognition and Clustering](https://arxiv.org/abs/1503.03832)
 **MTCNN 结构**　　
 ![mtcnn](http://affluent.oss-cn-hangzhou.aliyuncs.com/html/images/mtcnn_st.png)
 **如果你对DFace感兴趣并且想参与到这个项目中, 请查看目录下的　CONTRIBUTING.md　文档，它会实时展示一些需要＠ＴＯＤＯ的清单。我会用 [issues](https://github.com/DFace/DFace/issues)
 来跟踪和反馈所有的问题.**
 ## 安装
 DFace主要有两大模块，人脸检测和人脸识别。我会提供所有模型训练和运行的详细步骤。你首先需要构建一个pytorch和cv2的python环境，我推荐使用Anaconda来设置一个独立的虚拟环境。
 ### 依赖
 * cuda 8.0
 * anaconda
 * pytorch
 * torchvision
 * cv2
 * matplotlib
 在这里我提供了一个anaconda的环境依赖文件environment.yml，它能方便你构建自己的虚拟环境。
 ```shell
 conda env create -f path/to/environment.yml
 ```
 ### 人脸检测
 如果你对mtcnn模型感兴趣，以下过程可能会帮助到你。
 #### 训练mtcnn模型
 MTCNN主要有三个网络，叫做**PNet**, **RNet** 和 **ONet**。因此我们的训练过程也需要分三步先后进行。为了更好的实现效果，当前被训练的网络都将依赖于上一个训练好的网络来生成数据。所有的人脸数据集都来自　**[WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/)** 和 **[CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)**。WIDER FACE仅提供了大量的人脸边框定位数据，而CelebA包含了人脸关键点定位数据。
 * 生成PNet训练数据和标注文件
 ```shell
 python src/prepare_data/gen_Pnet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path}
 ```
 * 乱序合并标注文件
 ```shell
 python src/prepare_data/assemble_pnet_imglist.py
 ```
 * 训练PNet模型
 ```shell
 python src/train_net/train_p_net.py
 ```
 * 生成ＲNet训练数据和标注文件
 ```shell
 python src/prepare_data/gen_Rnet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} --pmodel_file {yout PNet model file trained before}
 ```
 * 乱序合并标注文件
 ```shell
 python src/prepare_data/assemble_rnet_imglist.py
 ```
 * 训练RNet模型
 ```shell
 python src/train_net/train_r_net.py
 ```
 * 生成ONet训练数据和标注文件
 ```shell
 python src/prepare_data/gen_Onet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} --pmodel_file {yout PNet model file trained before} --rmodel_file {yout RNet model file trained before}
 ```
 * 生成ONet的人脸关键点训练数据和标注文件
 ```shell
 python src/prepare_data/gen_landmark_48.py
 ```
 * 乱序合并标注文件(包括人脸关键点)
 ```shell
 python src/prepare_data/assemble_onet_imglist.py
 ```
 * 训练ONet模型
 ```shell
 python src/train_net/train_o_net.py
 ```
 #### 测试人脸检测
 ```shell
 python test_image.py
 ```    
 ### 人脸识别
 TODO  
 ##　测试效果
 ![mtcnn](http://affluent.oss-cn-hangzhou.aliyuncs.com/html/images/dface_demo.png)  
 ## License
 [Apache License 2.0](LICENSE)
 ## Reference
 * [Seanlinx/mtcnn](https://github.com/Seanlinx/mtcnn)
--- a/README_en.md
+++ b/README_en.md
@@ -0,0 +1,141 @@
 <div align=center>
 <a href="http://dface.io" target="_blank"><img src="http://pic.dface.io/dfacelogoblue.png" width="350"></a>
 </div>
 -----------------
 # DFace • [![License](http://pic.dface.io/apache2.svg)](https://opensource.org/licenses/Apache-2.0) [![gitter](http://pic.dface.io/gitee.svg)](https://gitter.im/cmusatyalab/DFace)
 | **`Linux CPU`** | **`Linux GPU`** | **`Mac OS CPU`** | **`Windows CPU`** |
 |-----------------|---------------------|------------------|-------------------|
 | [![Build Status](http://pic.dface.io/pass.svg)](http://pic.dface.io/pass.svg) | [![Build Status](http://pic.dface.io/pass.svg)](http://pic.dface.io/pass.svg) | [![Build Status](http://pic.dface.io/pass.svg)](http://pic.dface.io/pass.svg) | [![Build Status](http://pic.dface.io/pass.svg)](http://pic.dface.io/pass.svg) |
 **Free and open source face detection and recognition with
 deep learning. Based on the MTCNN and ResNet Center-Loss**
 [中文版　README](https://github.com/kuaikuaikim/DFace/blob/master/README_zh.md)
 **DFace** is an open source software for face detection and recognition. All features implemented by the **[pytorch](https://github.com/pytorch/pytorch)** (the facebook deeplearning framework). With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows developer to change the way your network behaves arbitrarily with zero lag or overhead.
 DFace inherit these advanced characteristic, that make it dynamic and ease code review.
 DFace support GPU acceleration with NVIDIA cuda. We highly recommend you use the linux GPU version.It's very fast and extremely realtime.
 Our inspiration comes from several research papers on this topic, as well as current and past work such as [Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks](https://arxiv.org/abs/1604.02878) and face recognition topic [FaceNet: A Unified Embedding for Face Recognition and Clustering](https://arxiv.org/abs/1503.03832)
 **MTCNN Structure**　　
 ![mtcnn](http://pic.dface.io/mtcnn.png)
 **If you want to contribute to DFace, please review the CONTRIBUTING.md in the project.We use [GitHub issues](https://github.com/DFace/DFace/issues) for
 tracking requests and bugs.**
 ## Installation
 DFace has two major module, detection and recognition.In these two, We provide all tutorials about how to train a model and running.
 First setting a pytorch and cv2. We suggest Anaconda to make a virtual and independent python envirment.
 ### Requirements
 * cuda 8.0
 * anaconda
 * pytorch
 * torchvision
 * cv2
 * matplotlib
 Also we provide a anaconda environment dependency list called environment.yml in the root path. 
 You can create your DFace environment very easily.
 ```shell
 conda env create -f path/to/environment.yml
 ```
 ### Face Detetion
 If you are interested in how to train a mtcnn model, you can follow next step.
 #### Train mtcnn Model
 MTCNN have three networks called **PNet**, **RNet** and **ONet**.So we should train it on three stage, and each stage depend on previous network which will generate train data to feed current train net, also propel the minimum loss between two networks.
 Please download the train face **datasets** before your training. We use **[WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/)** and **[CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)**
 * Generate PNet Train data and annotation file
 ```shell
 python src/prepare_data/gen_Pnet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path}
 ```
 * Assemble annotation file and shuffle it
 ```shell
 python src/prepare_data/assemble_pnet_imglist.py
 ```
 * Train PNet model
 ```shell
 python src/train_net/train_p_net.py
 ```
 * Generate RNet Train data and annotation file
 ```shell
 python src/prepare_data/gen_Rnet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} --pmodel_file {yout PNet model file trained before}
 ```
 * Assemble annotation file and shuffle it
 ```shell
 python src/prepare_data/assemble_rnet_imglist.py
 ```
 * Train RNet model
 ```shell
 python src/train_net/train_r_net.py
 ```
 * Generate ONet Train data and annotation file
 ```shell
 python src/prepare_data/gen_Onet_train_data.py --dataset_path {your dataset path} --anno_file {your dataset original annotation path} --pmodel_file {yout PNet model file trained before} --rmodel_file {yout RNet model file trained before}
 ```
 * Generate ONet Train landmarks data
 ```shell
 python src/prepare_data/gen_landmark_48.py
 ```
 * Assemble annotation file and shuffle it
 ```shell
 python src/prepare_data/assemble_onet_imglist.py
 ```
 * Train ONet model
 ```shell
 python src/train_net/train_o_net.py
 ```
 #### Test face detection
 ```shell
 python test_image.py
 ```    
 ### Face Recognition  
 TODO  
 ## Demo  
 ![mtcnn](http://pic.dface.io/figure_2.png)  
 ## License
 [Apache License 2.0](LICENSE)
 ## Reference
 * [Seanlinx/mtcnn](https://github.com/Seanlinx/mtcnn)
--- a/init.py
+++ b/init.py
--- a/anno_store/init.py
+++ b/anno_store/init.py
--- a/anno_store/info
+++ b/anno_store/info
@@ -0,0 +1 @@
 This directory store the annotation files of train data
--- a/environment.yml
+++ b/environment.yml
@@ -0,0 +1,66 @@
 name: pytorch
 channels:
 - soumith
 - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
 - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
 - defaults
 dependencies:
 - cairo=1.14.8=0
 - certifi=2016.2.28=py27_0
 - cffi=1.10.0=py27_0
 - fontconfig=2.12.1=3
 - freetype=2.5.5=2
 - glib=2.50.2=1
 - harfbuzz=0.9.39=2
 - hdf5=1.8.17=2
 - jbig=2.1=0
 - jpeg=8d=2
 - libffi=3.2.1=1
 - libgcc=5.2.0=0
 - libiconv=1.14=0
 - libpng=1.6.30=1
 - libtiff=4.0.6=2
 - libxml2=2.9.4=0
 - mkl=2017.0.3=0
 - numpy=1.12.1=py27_0
 - olefile=0.44=py27_0
 - opencv=3.1.0=np112py27_1
 - openssl=1.0.2l=0
 - pcre=8.39=1
 - pillow=3.4.2=py27_0
 - pip=9.0.1=py27_1
 - pixman=0.34.0=0
 - pycparser=2.18=py27_0
 - python=2.7.13=0
 - readline=6.2=2
 - setuptools=36.4.0=py27_1
 - six=1.10.0=py27_0
 - sqlite=3.13.0=0
 - tk=8.5.18=0
 - wheel=0.29.0=py27_0
 - xz=5.2.3=0
 - zlib=1.2.11=0
 - cycler=0.10.0=py27_0
 - dbus=1.10.20=0
 - expat=2.1.0=0
 - functools32=3.2.3.2=py27_0
 - gst-plugins-base=1.8.0=0
 - gstreamer=1.8.0=0
 - icu=54.1=0
 - libxcb=1.12=1
 - matplotlib=2.0.2=np112py27_0
 - pycairo=1.10.0=py27_0
 - pyparsing=2.2.0=py27_0
 - pyqt=5.6.0=py27_2
 - python-dateutil=2.6.1=py27_0
 - pytz=2017.2=py27_0
 - qt=5.6.2=2
 - sip=4.18=py27_0
 - subprocess32=3.2.7=py27_0
 - cuda80=1.0=0
 - pytorch=0.2.0=py27hc03bea1_4cu80
 - torchvision=0.1.9=py27hdb88a65_1
 - pip:
  - torch==0.2.0.post4
 prefix: /home/asy/.conda/envs/pytorch
--- a/log/init.py
+++ b/log/init.py
--- a/log/info
+++ b/log/info
@@ -0,0 +1 @@
 log dir
--- a/model_store/init.py
+++ b/model_store/init.py
--- a/model_store/info
+++ b/model_store/info
@@ -0,0 +1 @@
 This directory store trained model net parameters and structure
--- a/src/init.py
+++ b/src/init.py
--- a/src/config.py
+++ b/src/config.py
@@ -0,0 +1,42 @@
 import os
 MODEL_STORE_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))+"/model_store"
 ANNO_STORE_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))+"/anno_store"
 LOG_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))+"/log"
 USE_CUDA = True
 TRAIN_BATCH_SIZE = 512
 TRAIN_LR = 0.01
 END_EPOCH = 10
 PNET_POSTIVE_ANNO_FILENAME = "pos_12.txt"
 PNET_NEGATIVE_ANNO_FILENAME = "neg_12.txt"
 PNET_PART_ANNO_FILENAME = "part_12.txt"
 PNET_LANDMARK_ANNO_FILENAME = "landmark_12.txt"
 RNET_POSTIVE_ANNO_FILENAME = "pos_24.txt"
 RNET_NEGATIVE_ANNO_FILENAME = "neg_24.txt"
 RNET_PART_ANNO_FILENAME = "part_24.txt"
 RNET_LANDMARK_ANNO_FILENAME = "landmark_24.txt"
 ONET_POSTIVE_ANNO_FILENAME = "pos_48.txt"
 ONET_NEGATIVE_ANNO_FILENAME = "neg_48.txt"
 ONET_PART_ANNO_FILENAME = "part_48.txt"
 ONET_LANDMARK_ANNO_FILENAME = "landmark_48.txt"
 PNET_TRAIN_IMGLIST_FILENAME = "imglist_anno_12.txt"
 RNET_TRAIN_IMGLIST_FILENAME = "imglist_anno_24.txt"
 ONET_TRAIN_IMGLIST_FILENAME = "imglist_anno_48.txt"
--- a/src/core/init.py
+++ b/src/core/init.py
--- a/src/core/detect.py
+++ b/src/core/detect.py
@@ -0,0 +1,632 @@
 import cv2
 import time
 import numpy as np
 import torch
 from torch.autograd.variable import Variable
 from models import PNet,RNet,ONet
 import utils as utils
 import image_tools
 def create_mtcnn_net(p_model_path=None, r_model_path=None, o_model_path=None, use_cuda=True):
    pnet, rnet, onet = None, None, None
    if p_model_path is not None:
        pnet = PNet(use_cuda=use_cuda)
        pnet.load_state_dict(torch.load(p_model_path))
        if(use_cuda):
            pnet.cuda()
        pnet.eval()
    if r_model_path is not None:
        rnet = RNet(use_cuda=use_cuda)
        rnet.load_state_dict(torch.load(r_model_path))
        if (use_cuda):
            rnet.cuda()
        rnet.eval()
    if o_model_path is not None:
        onet = ONet(use_cuda=use_cuda)
        onet.load_state_dict(torch.load(o_model_path))
        if (use_cuda):
            onet.cuda()
        onet.eval()
    return pnet,rnet,onet
 class MtcnnDetector(object):
    """
        P,R,O net face detection and landmarks align
    """
    def __init__(self,
                 pnet = None,
                 rnet = None,
                 onet = None,
                 min_face_size=12,
                 stride=2,
                 threshold=[0.6, 0.7, 0.7],
                 scale_factor=0.709,
                 ):
        self.pnet_detector = pnet
        self.rnet_detector = rnet
        self.onet_detector = onet
        self.min_face_size = min_face_size
        self.stride=stride
        self.thresh = threshold
        self.scale_factor = scale_factor
    def unique_image_format(self,im):
        if not isinstance(im,np.ndarray):
            if im.mode == 'I':
                im = np.array(im, np.int32, copy=False)
            elif im.mode == 'I;16':
                im = np.array(im, np.int16, copy=False)
            else:
                im = np.asarray(im)
        return im
    def square_bbox(self, bbox):
        """
            convert bbox to square
        Parameters:
        ----------
            bbox: numpy array , shape n x m
                input bbox
        Returns:
        -------
            square bbox
        """
        square_bbox = bbox.copy()
        h = bbox[:, 3] - bbox[:, 1] + 1
        w = bbox[:, 2] - bbox[:, 0] + 1
        l = np.maximum(h,w)
        square_bbox[:, 0] = bbox[:, 0] + w*0.5 - l*0.5
        square_bbox[:, 1] = bbox[:, 1] + h*0.5 - l*0.5
        square_bbox[:, 2] = square_bbox[:, 0] + l - 1
        square_bbox[:, 3] = square_bbox[:, 1] + l - 1
        return square_bbox
    def generate_bounding_box(self, map, reg, scale, threshold):
        """
            generate bbox from feature map
        Parameters:
        ----------
            map: numpy array , n x m x 1
                detect score for each position
            reg: numpy array , n x m x 4
                bbox
            scale: float number
                scale of this detection
            threshold: float number
                detect threshold
        Returns:
        -------
            bbox array
        """
        stride = 2
        cellsize = 12
        t_index = np.where(map > threshold)
        # find nothing
        if t_index[0].size == 0:
            return np.array([])
        dx1, dy1, dx2, dy2 = [reg[0, t_index[0], t_index[1], i] for i in range(4)]
        reg = np.array([dx1, dy1, dx2, dy2])
        # lefteye_dx, lefteye_dy, righteye_dx, righteye_dy, nose_dx, nose_dy, \
        # leftmouth_dx, leftmouth_dy, rightmouth_dx, rightmouth_dy = [landmarks[0, t_index[0], t_index[1], i] for i in range(10)]
        #
        # landmarks = np.array([lefteye_dx, lefteye_dy, righteye_dx, righteye_dy, nose_dx, nose_dy, leftmouth_dx, leftmouth_dy, rightmouth_dx, rightmouth_dy])
        score = map[t_index[0], t_index[1], 0]
        boundingbox = np.vstack([np.round((stride * t_index[1]) / scale),
                                 np.round((stride * t_index[0]) / scale),
                                 np.round((stride * t_index[1] + cellsize) / scale),
                                 np.round((stride * t_index[0] + cellsize) / scale),
                                 score,
                                 reg,
                                 # landmarks
                                 ])
        return boundingbox.T
    def resize_image(self, img, scale):
        """
            resize image and transform dimention to [batchsize, channel, height, width]
        Parameters:
        ----------
            img: numpy array , height x width x channel
                input image, channels in BGR order here
            scale: float number
                scale factor of resize operation
        Returns:
        -------
            transformed image tensor , 1 x channel x height x width
        """
        height, width, channels = img.shape
        new_height = int(height * scale)     # resized new height
        new_width = int(width * scale)       # resized new width
        new_dim = (new_width, new_height)
        img_resized = cv2.resize(img, new_dim, interpolation=cv2.INTER_LINEAR)      # resized image
        return img_resized
    def pad(self, bboxes, w, h):
        """
            pad the the boxes
        Parameters:
        ----------
            bboxes: numpy array, n x 5
                input bboxes
            w: float number
                width of the input image
            h: float number
                height of the input image
        Returns :
        ------
            dy, dx : numpy array, n x 1
                start point of the bbox in target image
            edy, edx : numpy array, n x 1
                end point of the bbox in target image
            y, x : numpy array, n x 1
                start point of the bbox in original image
            ex, ex : numpy array, n x 1
                end point of the bbox in original image
            tmph, tmpw: numpy array, n x 1
                height and width of the bbox
        """
        tmpw = (bboxes[:, 2] - bboxes[:, 0] + 1).astype(np.int32)
        tmph = (bboxes[:, 3] - bboxes[:, 1] + 1).astype(np.int32)
        numbox = bboxes.shape[0]
        dx = np.zeros((numbox, ))
        dy = np.zeros((numbox, ))
        edx, edy  = tmpw.copy()-1, tmph.copy()-1
        x, y, ex, ey = bboxes[:, 0], bboxes[:, 1], bboxes[:, 2], bboxes[:, 3]
        tmp_index = np.where(ex > w-1)
        edx[tmp_index] = tmpw[tmp_index] + w - 2 - ex[tmp_index]
        ex[tmp_index] = w - 1
        tmp_index = np.where(ey > h-1)
        edy[tmp_index] = tmph[tmp_index] + h - 2 - ey[tmp_index]
        ey[tmp_index] = h - 1
        tmp_index = np.where(x < 0)
        dx[tmp_index] = 0 - x[tmp_index]
        x[tmp_index] = 0
        tmp_index = np.where(y < 0)
        dy[tmp_index] = 0 - y[tmp_index]
        y[tmp_index] = 0
        return_list = [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph]
        return_list = [item.astype(np.int32) for item in return_list]
        return return_list
    def detect_pnet(self, im):
        """Get face candidates through pnet
        Parameters:
        ----------
        im: numpy array
            input image array
        Returns:
        -------
        boxes: numpy array
            detected boxes before calibration
        boxes_align: numpy array
            boxes after calibration
        """
        # im = self.unique_image_format(im)
        h, w, c = im.shape
        net_size = 12
        current_scale = float(net_size) / self.min_face_size    # find initial scale
        im_resized = self.resize_image(im, current_scale)
        current_height, current_width, _ = im_resized.shape
        # fcn
        all_boxes = list()
        while min(current_height, current_width) > net_size:
            feed_imgs = []
            image_tensor = image_tools.convert_image_to_tensor(im_resized)
            feed_imgs.append(image_tensor)
            feed_imgs = torch.stack(feed_imgs)
            feed_imgs = Variable(feed_imgs)
            if self.pnet_detector.use_cuda:
                feed_imgs = feed_imgs.cuda()
            cls_map, reg = self.pnet_detector(feed_imgs)
            cls_map_np = image_tools.convert_chwTensor_to_hwcNumpy(cls_map.cpu())
            reg_np = image_tools.convert_chwTensor_to_hwcNumpy(reg.cpu())
            # landmark_np = image_tools.convert_chwTensor_to_hwcNumpy(landmark.cpu())
            boxes = self.generate_bounding_box(cls_map_np[ 0, :, :], reg_np, current_scale, self.thresh[0])
            current_scale *= self.scale_factor
            im_resized = self.resize_image(im, current_scale)
            current_height, current_width, _ = im_resized.shape
            if boxes.size == 0:
                continue
            keep = utils.nms(boxes[:, :5], 0.5, 'Union')
            boxes = boxes[keep]
            all_boxes.append(boxes)
        if len(all_boxes) == 0:
            return None, None
        all_boxes = np.vstack(all_boxes)
        # merge the detection from first stage
        keep = utils.nms(all_boxes[:, 0:5], 0.7, 'Union')
        all_boxes = all_boxes[keep]
        # boxes = all_boxes[:, :5]
        bw = all_boxes[:, 2] - all_boxes[:, 0] + 1
        bh = all_boxes[:, 3] - all_boxes[:, 1] + 1
        # landmark_keep = all_boxes[:, 9:].reshape((5,2))
        boxes = np.vstack([all_boxes[:,0],
                   all_boxes[:,1],
                   all_boxes[:,2],
                   all_boxes[:,3],
                   all_boxes[:,4],
                   # all_boxes[:, 0] + all_boxes[:, 9] * bw,
                   # all_boxes[:, 1] + all_boxes[:,10] * bh,
                   # all_boxes[:, 0] + all_boxes[:, 11] * bw,
                   # all_boxes[:, 1] + all_boxes[:, 12] * bh,
                   # all_boxes[:, 0] + all_boxes[:, 13] * bw,
                   # all_boxes[:, 1] + all_boxes[:, 14] * bh,
                   # all_boxes[:, 0] + all_boxes[:, 15] * bw,
                   # all_boxes[:, 1] + all_boxes[:, 16] * bh,
                   # all_boxes[:, 0] + all_boxes[:, 17] * bw,
                   # all_boxes[:, 1] + all_boxes[:, 18] * bh
                  ])
        boxes = boxes.T
        align_topx = all_boxes[:, 0] + all_boxes[:, 5] * bw
        align_topy = all_boxes[:, 1] + all_boxes[:, 6] * bh
        align_bottomx = all_boxes[:, 2] + all_boxes[:, 7] * bw
        align_bottomy = all_boxes[:, 3] + all_boxes[:, 8] * bh
        # refine the boxes
        boxes_align = np.vstack([ align_topx,
                              align_topy,
                              align_bottomx,
                              align_bottomy,
                              all_boxes[:, 4],
                              # align_topx + all_boxes[:,9] * bw,
                              # align_topy + all_boxes[:,10] * bh,
                              # align_topx + all_boxes[:,11] * bw,
                              # align_topy + all_boxes[:,12] * bh,
                              # align_topx + all_boxes[:,13] * bw,
                              # align_topy + all_boxes[:,14] * bh,
                              # align_topx + all_boxes[:,15] * bw,
                              # align_topy + all_boxes[:,16] * bh,
                              # align_topx + all_boxes[:,17] * bw,
                              # align_topy + all_boxes[:,18] * bh,
                              ])
        boxes_align = boxes_align.T
        return boxes, boxes_align
    def detect_rnet(self, im, dets):
        """Get face candidates using rnet
        Parameters:
        ----------
        im: numpy array
            input image array
        dets: numpy array
            detection results of pnet
        Returns:
        -------
        boxes: numpy array
            detected boxes before calibration
        boxes_align: numpy array
            boxes after calibration
        """
        h, w, c = im.shape
        if dets is None:
            return None,None
        dets = self.square_bbox(dets)
        dets[:, 0:4] = np.round(dets[:, 0:4])
        [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] = self.pad(dets, w, h)
        num_boxes = dets.shape[0]
        '''
        # helper for setting RNet batch size
        batch_size = self.rnet_detector.batch_size
        ratio = float(num_boxes) / batch_size
        if ratio > 3 or ratio < 0.3:
            print "You may need to reset RNet batch size if this info appears frequently, \
 face candidates:%d, current batch_size:%d"%(num_boxes, batch_size)
        '''
        # cropped_ims_tensors = np.zeros((num_boxes, 3, 24, 24), dtype=np.float32)
        cropped_ims_tensors = []
        for i in range(num_boxes):
            tmp = np.zeros((tmph[i], tmpw[i], 3), dtype=np.uint8)
            tmp[dy[i]:edy[i]+1, dx[i]:edx[i]+1, :] = im[y[i]:ey[i]+1, x[i]:ex[i]+1, :]
            crop_im = cv2.resize(tmp, (24, 24))
            crop_im_tensor = image_tools.convert_image_to_tensor(crop_im)
            # cropped_ims_tensors[i, :, :, :] = crop_im_tensor
            cropped_ims_tensors.append(crop_im_tensor)
        feed_imgs = Variable(torch.stack(cropped_ims_tensors))
        if self.rnet_detector.use_cuda:
            feed_imgs = feed_imgs.cuda()
        cls_map, reg = self.rnet_detector(feed_imgs)
        cls_map = cls_map.cpu().data.numpy()
        reg = reg.cpu().data.numpy()
        # landmark = landmark.cpu().data.numpy()
        keep_inds = np.where(cls_map > self.thresh[1])[0]
        if len(keep_inds) > 0:
            boxes = dets[keep_inds]
            cls = cls_map[keep_inds]
            reg = reg[keep_inds]
            # landmark = landmark[keep_inds]
        else:
            return None, None
        keep = utils.nms(boxes, 0.7)
        if len(keep) == 0:
            return None, None
        keep_cls = cls[keep]
        keep_boxes = boxes[keep]
        keep_reg = reg[keep]
        # keep_landmark = landmark[keep]
        bw = keep_boxes[:, 2] - keep_boxes[:, 0] + 1
        bh = keep_boxes[:, 3] - keep_boxes[:, 1] + 1
        boxes = np.vstack([ keep_boxes[:,0],
                              keep_boxes[:,1],
                              keep_boxes[:,2],
                              keep_boxes[:,3],
                              keep_cls[:,0],
                              # keep_boxes[:,0] + keep_landmark[:, 0] * bw,
                              # keep_boxes[:,1] + keep_landmark[:, 1] * bh,
                              # keep_boxes[:,0] + keep_landmark[:, 2] * bw,
                              # keep_boxes[:,1] + keep_landmark[:, 3] * bh,
                              # keep_boxes[:,0] + keep_landmark[:, 4] * bw,
                              # keep_boxes[:,1] + keep_landmark[:, 5] * bh,
                              # keep_boxes[:,0] + keep_landmark[:, 6] * bw,
                              # keep_boxes[:,1] + keep_landmark[:, 7] * bh,
                              # keep_boxes[:,0] + keep_landmark[:, 8] * bw,
                              # keep_boxes[:,1] + keep_landmark[:, 9] * bh,
                            ])
        align_topx = keep_boxes[:,0] + keep_reg[:,0] * bw
        align_topy = keep_boxes[:,1] + keep_reg[:,1] * bh
        align_bottomx = keep_boxes[:,2] + keep_reg[:,2] * bw
        align_bottomy = keep_boxes[:,3] + keep_reg[:,3] * bh
        boxes_align = np.vstack([align_topx,
                               align_topy,
                               align_bottomx,
                               align_bottomy,
                               keep_cls[:, 0],
                               # align_topx + keep_landmark[:, 0] * bw,
                               # align_topy + keep_landmark[:, 1] * bh,
                               # align_topx + keep_landmark[:, 2] * bw,
                               # align_topy + keep_landmark[:, 3] * bh,
                               # align_topx + keep_landmark[:, 4] * bw,
                               # align_topy + keep_landmark[:, 5] * bh,
                               # align_topx + keep_landmark[:, 6] * bw,
                               # align_topy + keep_landmark[:, 7] * bh,
                               # align_topx + keep_landmark[:, 8] * bw,
                               # align_topy + keep_landmark[:, 9] * bh,
                             ])
        boxes = boxes.T
        boxes_align = boxes_align.T
        return boxes, boxes_align
    def detect_onet(self, im, dets):
        """Get face candidates using onet
        Parameters:
        ----------
        im: numpy array
            input image array
        dets: numpy array
            detection results of rnet
        Returns:
        -------
        boxes_align: numpy array
            boxes after calibration
        landmarks_align: numpy array
            landmarks after calibration
        """
        h, w, c = im.shape
        if dets is None:
            return None, None
        dets = self.square_bbox(dets)
        dets[:, 0:4] = np.round(dets[:, 0:4])
        [dy, edy, dx, edx, y, ey, x, ex, tmpw, tmph] = self.pad(dets, w, h)
        num_boxes = dets.shape[0]
        # cropped_ims_tensors = np.zeros((num_boxes, 3, 24, 24), dtype=np.float32)
        cropped_ims_tensors = []
        for i in range(num_boxes):
            tmp = np.zeros((tmph[i], tmpw[i], 3), dtype=np.uint8)
            tmp[dy[i]:edy[i] + 1, dx[i]:edx[i] + 1, :] = im[y[i]:ey[i] + 1, x[i]:ex[i] + 1, :]
            crop_im = cv2.resize(tmp, (48, 48))
            crop_im_tensor = image_tools.convert_image_to_tensor(crop_im)
            # cropped_ims_tensors[i, :, :, :] = crop_im_tensor
            cropped_ims_tensors.append(crop_im_tensor)
        feed_imgs = Variable(torch.stack(cropped_ims_tensors))
        if self.rnet_detector.use_cuda:
            feed_imgs = feed_imgs.cuda()
        cls_map, reg, landmark = self.onet_detector(feed_imgs)
        cls_map = cls_map.cpu().data.numpy()
        reg = reg.cpu().data.numpy()
        landmark = landmark.cpu().data.numpy()
        keep_inds = np.where(cls_map > self.thresh[2])[0]
        if len(keep_inds) > 0:
            boxes = dets[keep_inds]
            cls = cls_map[keep_inds]
            reg = reg[keep_inds]
            landmark = landmark[keep_inds]
        else:
            return None, None
        keep = utils.nms(boxes, 0.7, mode="Minimum")
        if len(keep) == 0:
            return None, None
        keep_cls = cls[keep]
        keep_boxes = boxes[keep]
        keep_reg = reg[keep]
        keep_landmark = landmark[keep]
        bw = keep_boxes[:, 2] - keep_boxes[:, 0] + 1
        bh = keep_boxes[:, 3] - keep_boxes[:, 1] + 1
        align_topx = keep_boxes[:, 0] + keep_reg[:, 0] * bw
        align_topy = keep_boxes[:, 1] + keep_reg[:, 1] * bh
        align_bottomx = keep_boxes[:, 2] + keep_reg[:, 2] * bw
        align_bottomy = keep_boxes[:, 3] + keep_reg[:, 3] * bh
        align_landmark_topx = keep_boxes[:, 0]
        align_landmark_topy = keep_boxes[:, 1]
        boxes_align = np.vstack([align_topx,
                                 align_topy,
                                 align_bottomx,
                                 align_bottomy,
                                 keep_cls[:, 0],
                                 # align_topx + keep_landmark[:, 0] * bw,
                                 # align_topy + keep_landmark[:, 1] * bh,
                                 # align_topx + keep_landmark[:, 2] * bw,
                                 # align_topy + keep_landmark[:, 3] * bh,
                                 # align_topx + keep_landmark[:, 4] * bw,
                                 # align_topy + keep_landmark[:, 5] * bh,
                                 # align_topx + keep_landmark[:, 6] * bw,
                                 # align_topy + keep_landmark[:, 7] * bh,
                                 # align_topx + keep_landmark[:, 8] * bw,
                                 # align_topy + keep_landmark[:, 9] * bh,
                                 ])
        boxes_align = boxes_align.T
        landmark =  np.vstack([
                                 align_landmark_topx + keep_landmark[:, 0] * bw,
                                 align_landmark_topy + keep_landmark[:, 1] * bh,
                                 align_landmark_topx + keep_landmark[:, 2] * bw,
                                 align_landmark_topy + keep_landmark[:, 3] * bh,
                                 align_landmark_topx + keep_landmark[:, 4] * bw,
                                 align_landmark_topy + keep_landmark[:, 5] * bh,
                                 align_landmark_topx + keep_landmark[:, 6] * bw,
                                 align_landmark_topy + keep_landmark[:, 7] * bh,
                                 align_landmark_topx + keep_landmark[:, 8] * bw,
                                 align_landmark_topy + keep_landmark[:, 9] * bh,
                                 ])
        landmark_align = landmark.T
        return boxes_align, landmark_align
    def detect_face(self,img):
        """Detect face over image
        """
        boxes_align = np.array([])
        landmark_align =np.array([])
        t = time.time()
        # pnet
        if self.pnet_detector:
            boxes, boxes_align = self.detect_pnet(img)
            if boxes_align is None:
                return np.array([]), np.array([])
            t1 = time.time() - t
            t = time.time()
        # rnet
        if self.rnet_detector:
            boxes, boxes_align = self.detect_rnet(img, boxes_align)
            if boxes_align is None:
                return np.array([]), np.array([])
            t2 = time.time() - t
            t = time.time()
        # onet
        if self.onet_detector:
            boxes_align, landmark_align = self.detect_onet(img, boxes_align)
            if boxes_align is None:
                return np.array([]), np.array([])
            t3 = time.time() - t
            t = time.time()
            print "time cost " + '{:.3f}'.format(t1+t2+t3) + '  pnet {:.3f}  rnet {:.3f}  onet {:.3f}'.format(t1, t2, t3)
        return boxes_align, landmark_align
--- a/src/core/image_reader.py
+++ b/src/core/image_reader.py
@@ -0,0 +1,171 @@
 import numpy as np
 import cv2
 class TrainImageReader:
    def __init__(self, imdb, im_size, batch_size=128, shuffle=False):
        self.imdb = imdb
        self.batch_size = batch_size
        self.im_size = im_size
        self.shuffle = shuffle
        self.cur = 0
        self.size = len(imdb)
        self.index = np.arange(self.size)
        self.num_classes = 2
        self.batch = None
        self.data = None
        self.label = None
        self.label_names= ['label', 'bbox_target', 'landmark_target']
        self.reset()
        self.get_batch()
    def reset(self):
        self.cur = 0
        if self.shuffle:
            np.random.shuffle(self.index)
    def iter_next(self):
        return self.cur + self.batch_size <= self.size
    def __iter__(self):
        return self
    def __next__(self):
        return self.next()
    def next(self):
        if self.iter_next():
            self.get_batch()
            self.cur += self.batch_size
            return self.data,self.label
        else:
            raise StopIteration
    def getindex(self):
        return self.cur / self.batch_size
    def getpad(self):
        if self.cur + self.batch_size > self.size:
            return self.cur + self.batch_size - self.size
        else:
            return 0
    def get_batch(self):
        cur_from = self.cur
        cur_to = min(cur_from + self.batch_size, self.size)
        imdb = [self.imdb[self.index[i]] for i in range(cur_from, cur_to)]
        data, label = get_minibatch(imdb)
        self.data = data['data']
        self.label = [label[name] for name in self.label_names]
 class TestImageLoader:
    def __init__(self, imdb, batch_size=1, shuffle=False):
        self.imdb = imdb
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.size = len(imdb)
        self.index = np.arange(self.size)
        self.cur = 0
        self.data = None
        self.label = None
        self.reset()
        self.get_batch()
    def reset(self):
        self.cur = 0
        if self.shuffle:
            np.random.shuffle(self.index)
    def iter_next(self):
        return self.cur + self.batch_size <= self.size
    def __iter__(self):
        return self
    def __next__(self):
        return self.next()
    def next(self):
        if self.iter_next():
            self.get_batch()
            self.cur += self.batch_size
            return self.data
        else:
            raise StopIteration
    def getindex(self):
        return self.cur / self.batch_size
    def getpad(self):
        if self.cur + self.batch_size > self.size:
            return self.cur + self.batch_size - self.size
        else:
            return 0
    def get_batch(self):
        cur_from = self.cur
        cur_to = min(cur_from + self.batch_size, self.size)
        imdb = [self.imdb[self.index[i]] for i in range(cur_from, cur_to)]
        data= get_testbatch(imdb)
        self.data=data['data']
 def get_minibatch(imdb):
    # im_size: 12, 24 or 48
    num_images = len(imdb)
    processed_ims = list()
    cls_label = list()
    bbox_reg_target = list()
    landmark_reg_target = list()
    for i in range(num_images):
        im = cv2.imread(imdb[i]['image'])
        #im = Image.open(imdb[i]['image'])
        if imdb[i]['flipped']:
            im = im[:, ::-1, :]
            #im = im.transpose(Image.FLIP_LEFT_RIGHT)
        cls = imdb[i]['label']
        bbox_target = imdb[i]['bbox_target']
        landmark = imdb[i]['landmark_target']
        processed_ims.append(im)
        cls_label.append(cls)
        bbox_reg_target.append(bbox_target)
        landmark_reg_target.append(landmark)
    im_array = np.asarray(processed_ims)
    label_array = np.array(cls_label)
    bbox_target_array = np.vstack(bbox_reg_target)
    landmark_target_array = np.vstack(landmark_reg_target)
    data = {'data': im_array}
    label = {'label': label_array,
             'bbox_target': bbox_target_array,
             'landmark_target': landmark_target_array
             }
    return data, label
 def get_testbatch(imdb):
    assert len(imdb) == 1, "Single batch only"
    im = cv2.imread(imdb[0]['image'])
    data = {'data': im}
    return data
--- a/src/core/image_tools.py
+++ b/src/core/image_tools.py
@@ -0,0 +1,40 @@
 import torchvision.transforms as transforms
 import torch
 from torch.autograd.variable import Variable
 import numpy as np
 transform = transforms.ToTensor()
 def convert_image_to_tensor(image):
    """convert an image to pytorch tensor
        Parameters:
        ----------
        image: numpy array , h * w * c
        Returns:
        -------
        image_tensor: pytorch.FloatTensor, c * h * w
        """
    image = image.astype(np.float)
    return transform(image)
    # return transform(image)
 def convert_chwTensor_to_hwcNumpy(tensor):
    """convert a group images pytorch tensor(count * c * h * w) to numpy array images(count * h * w * c)
            Parameters:
            ----------
            tensor: numpy array , count * c * h * w
            Returns:
            -------
            numpy array images: count * h * w * c
            """
    if isinstance(tensor, Variable):
        return np.transpose(tensor.data.numpy(), (0,2,3,1))
    elif isinstance(tensor, torch.FloatTensor):
        return np.transpose(tensor.numpy(), (0,2,3,1))
    else:
        raise Exception("covert b*c*h*w tensor to b*h*w*c numpy error.This tensor must have 4 dimension.")
--- a/src/core/imagedb.py
+++ b/src/core/imagedb.py
@@ -0,0 +1,162 @@
 import os
 import numpy as np
 class ImageDB(object):
    def __init__(self, image_annotation_file, prefix_path='', mode='train'):
        self.prefix_path = prefix_path
        self.image_annotation_file = image_annotation_file
        self.classes = ['__background__', 'face']
        self.num_classes = 2
        self.image_set_index = self.load_image_set_index()
        self.num_images = len(self.image_set_index)
        self.mode = mode
    def load_image_set_index(self):
        """Get image index
        Parameters:
        ----------
        Returns:
        -------
        image_set_index: str
            relative path of image
        """
        assert os.path.exists(self.image_annotation_file), 'Path does not exist: {}'.format(self.image_annotation_file)
        with open(self.image_annotation_file, 'r') as f:
            image_set_index = [x.strip().split(' ')[0] for x in f.readlines()]
        return image_set_index
    def load_imdb(self):
        """Get and save ground truth image database
        Parameters:
        ----------
        Returns:
        -------
        gt_imdb: dict
            image database with annotations
        """
        #cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl')
        #if os.path.exists(cache_file):
        #    with open(cache_file, 'rb') as f:
        #        imdb = cPickle.load(f)
        #    print '{} gt imdb loaded from {}'.format(self.name, cache_file)
        #    return imdb
        gt_imdb = self.load_annotations()
        #with open(cache_file, 'wb') as f:
        #    cPickle.dump(gt_imdb, f, cPickle.HIGHEST_PROTOCOL)
        return gt_imdb
    def real_image_path(self, index):
        """Given image index, return full path
        Parameters:
        ----------
        index: str
            relative path of image
        Returns:
        -------
        image_file: str
            full path of image
        """
        index = index.replace("\\", "/")
        if not os.path.exists(index):
            image_file = os.path.join(self.prefix_path, index)
        else:
            image_file=index
        if not image_file.endswith('.jpg'):
            image_file = image_file + '.jpg'
        assert os.path.exists(image_file), 'Path does not exist: {}'.format(image_file)
        return image_file
    def load_annotations(self,annotion_type=1):
        """Load annotations
        Parameters:
        ----------
        annotion_type: int
                      0:dsadsa
                      1:dsadsa
        Returns:
        -------
        imdb: dict
            image database with annotations
        """
        assert os.path.exists(self.image_annotation_file), 'annotations not found at {}'.format(self.image_annotation_file)
        with open(self.image_annotation_file, 'r') as f:
            annotations = f.readlines()
        imdb = []
        for i in range(self.num_images):
            annotation = annotations[i].strip().split(' ')
            index = annotation[0]
            im_path = self.real_image_path(index)
            imdb_ = dict()
            imdb_['image'] = im_path
            if self.mode == 'test':
               # gt_boxes = map(float, annotation[1:])
               # boxes = np.array(bbox, dtype=np.float32).reshape(-1, 4)
               # imdb_['gt_boxes'] = boxes
                pass
            else:
                label = annotation[1]
                imdb_['label'] = int(label)
                imdb_['flipped'] = False
                imdb_['bbox_target'] = np.zeros((4,))
                imdb_['landmark_target'] = np.zeros((10,))
                if len(annotation[2:])==4:
                    bbox_target = annotation[2:6]
                    imdb_['bbox_target'] = np.array(bbox_target).astype(float)
                if len(annotation[2:])==14:
                    bbox_target = annotation[2:6]
                    imdb_['bbox_target'] = np.array(bbox_target).astype(float)
                    landmark = annotation[6:]
                    imdb_['landmark_target'] = np.array(landmark).astype(float)
            imdb.append(imdb_)
        return imdb
    def append_flipped_images(self, imdb):
        """append flipped images to imdb
        Parameters:
        ----------
        imdb: imdb
            image database
        Returns:
        -------
        imdb: dict
            image database with flipped image annotations added
        """
        print 'append flipped images to imdb', len(imdb)
        for i in range(len(imdb)):
            imdb_ = imdb[i]
            m_bbox = imdb_['bbox_target'].copy()
            m_bbox[0], m_bbox[2] = -m_bbox[2], -m_bbox[0]
            landmark_ = imdb_['landmark_target'].copy()
            landmark_ = landmark_.reshape((5, 2))
            landmark_ = np.asarray([(1 - x, y) for (x, y) in landmark_])
            landmark_[[0, 1]] = landmark_[[1, 0]]
            landmark_[[3, 4]] = landmark_[[4, 3]]
            item = {'image': imdb_['image'],
                     'label': imdb_['label'],
                     'bbox_target': m_bbox,
                     'landmark_target': landmark_.reshape((10)),
                     'flipped': True}
            imdb.append(item)
        self.image_set_index *= 2
        return imdb
--- a/src/core/models.py
+++ b/src/core/models.py
@@ -0,0 +1,207 @@
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
 def weights_init(m):
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
        nn.init.xavier_uniform(m.weight.data)
        nn.init.constant(m.bias, 0.1)
 class LossFn:
    def __init__(self, cls_factor=1, box_factor=1, landmark_factor=1):
        # loss function
        self.cls_factor = cls_factor
        self.box_factor = box_factor
        self.land_factor = landmark_factor
        self.loss_cls = nn.BCELoss()
        self.loss_box = nn.MSELoss()
        self.loss_landmark = nn.MSELoss()
    def cls_loss(self,gt_label,pred_label):
        pred_label = torch.squeeze(pred_label)
        gt_label = torch.squeeze(gt_label)
        # get the mask element which >= 0, only 0 and 1 can effect the detection loss
        mask = torch.ge(gt_label,0)
        valid_gt_label = torch.masked_select(gt_label,mask)
        valid_pred_label = torch.masked_select(pred_label,mask)
        return self.loss_cls(valid_pred_label,valid_gt_label)*self.cls_factor
    def box_loss(self,gt_label,gt_offset,pred_offset):
        pred_offset = torch.squeeze(pred_offset)
        gt_offset = torch.squeeze(gt_offset)
        gt_label = torch.squeeze(gt_label)
        #get the mask element which != 0
        unmask = torch.eq(gt_label,0)
        mask = torch.eq(unmask,0)
        #convert mask to dim index
        chose_index = torch.nonzero(mask.data)
        chose_index = torch.squeeze(chose_index)
        #only valid element can effect the loss
        valid_gt_offset = gt_offset[chose_index,:]
        valid_pred_offset = pred_offset[chose_index,:]
        return self.loss_box(valid_pred_offset,valid_gt_offset)*self.box_factor
    def landmark_loss(self,gt_label,gt_landmark,pred_landmark):
        pred_landmark = torch.squeeze(pred_landmark)
        gt_landmark = torch.squeeze(gt_landmark)
        gt_label = torch.squeeze(gt_label)
        mask = torch.eq(gt_label,-2)
        chose_index = torch.nonzero(mask.data)
        chose_index = torch.squeeze(chose_index)
        valid_gt_landmark = gt_landmark[chose_index, :]
        valid_pred_landmark = pred_landmark[chose_index, :]
        return self.loss_landmark(valid_pred_landmark,valid_gt_landmark)*self.land_factor
 class PNet(nn.Module):
    ''' PNet '''
    def __init__(self, is_train=False, use_cuda=True):
        super(PNet, self).__init__()
        self.is_train = is_train
        self.use_cuda = use_cuda
        # backend
        self.pre_layer = nn.Sequential(
            nn.Conv2d(3, 10, kernel_size=3, stride=1),  # conv1
            nn.PReLU(),  # PReLU1
            nn.MaxPool2d(kernel_size=2, stride=2),  # pool1
            nn.Conv2d(10, 16, kernel_size=3, stride=1),  # conv2
            nn.PReLU(),  # PReLU2
            nn.Conv2d(16, 32, kernel_size=3, stride=1),  # conv3
            nn.PReLU()  # PReLU3
        )
        # detection
        self.conv4_1 = nn.Conv2d(32, 1, kernel_size=1, stride=1)
        # bounding box regresion
        self.conv4_2 = nn.Conv2d(32, 4, kernel_size=1, stride=1)
        # landmark localization
        self.conv4_3 = nn.Conv2d(32, 10, kernel_size=1, stride=1)
        # weight initiation with xavier
        self.apply(weights_init)
    def forward(self, x):
        x = self.pre_layer(x)
        label = F.sigmoid(self.conv4_1(x))
        offset = self.conv4_2(x)
        # landmark = self.conv4_3(x)
        if self.is_train is True:
            # label_loss = LossUtil.label_loss(self.gt_label,torch.squeeze(label))
            # bbox_loss = LossUtil.bbox_loss(self.gt_bbox,torch.squeeze(offset))
            return label,offset
        #landmark = self.conv4_3(x)
        return label, offset
 class RNet(nn.Module):
    ''' RNet '''
    def __init__(self,is_train=False, use_cuda=True):
        super(RNet, self).__init__()
        self.is_train = is_train
        self.use_cuda = use_cuda
        # backend
        self.pre_layer = nn.Sequential(
            nn.Conv2d(3, 28, kernel_size=3, stride=1),  # conv1
            nn.PReLU(),  # prelu1
            nn.MaxPool2d(kernel_size=3, stride=2),  # pool1
            nn.Conv2d(28, 48, kernel_size=3, stride=1),  # conv2
            nn.PReLU(),  # prelu2
            nn.MaxPool2d(kernel_size=3, stride=2),  # pool2
            nn.Conv2d(48, 64, kernel_size=2, stride=1),  # conv3
            nn.PReLU()  # prelu3
        )
        self.conv4 = nn.Linear(64*2*2, 128)  # conv4
        self.prelu4 = nn.PReLU()  # prelu4
        # detection
        self.conv5_1 = nn.Linear(128, 1)
        # bounding box regression
        self.conv5_2 = nn.Linear(128, 4)
        # lanbmark localization
        self.conv5_3 = nn.Linear(128, 10)
        # weight initiation weih xavier
        self.apply(weights_init)
    def forward(self, x):
        # backend
        x = self.pre_layer(x)
        x = x.view(x.size(0), -1)
        x = self.conv4(x)
        x = self.prelu4(x)
        # detection
        det = torch.sigmoid(self.conv5_1(x))
        box = self.conv5_2(x)
        # landmark = self.conv5_3(x)
        if self.is_train is True:
            return det, box
        #landmard = self.conv5_3(x)
        return det, box
 class ONet(nn.Module):
    ''' RNet '''
    def __init__(self,is_train=False, use_cuda=True):
        super(ONet, self).__init__()
        self.is_train = is_train
        self.use_cuda = use_cuda
        # backend
        self.pre_layer = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1),  # conv1
            nn.PReLU(),  # prelu1
            nn.MaxPool2d(kernel_size=3, stride=2),  # pool1
            nn.Conv2d(32, 64, kernel_size=3, stride=1),  # conv2
            nn.PReLU(),  # prelu2
            nn.MaxPool2d(kernel_size=3, stride=2),  # pool2
            nn.Conv2d(64, 64, kernel_size=3, stride=1),  # conv3
            nn.PReLU(), # prelu3
            nn.MaxPool2d(kernel_size=2,stride=2), # pool3
            nn.Conv2d(64,128,kernel_size=2,stride=1), # conv4
            nn.PReLU() # prelu4
        )
        self.conv5 = nn.Linear(128*2*2, 256)  # conv5
        self.prelu5 = nn.PReLU()  # prelu5
        # detection
        self.conv6_1 = nn.Linear(256, 1)
        # bounding box regression
        self.conv6_2 = nn.Linear(256, 4)
        # lanbmark localization
        self.conv6_3 = nn.Linear(256, 10)
        # weight initiation weih xavier
        self.apply(weights_init)
    def forward(self, x):
        # backend
        x = self.pre_layer(x)
        x = x.view(x.size(0), -1)
        x = self.conv5(x)
        x = self.prelu5(x)
        # detection
        det = torch.sigmoid(self.conv6_1(x))
        box = self.conv6_2(x)
        landmark = self.conv6_3(x)
        if self.is_train is True:
            return det, box, landmark
        #landmard = self.conv5_3(x)
        return det, box, landmark
--- a/src/core/nms.py
+++ b/src/core/nms.py
@@ -0,0 +1,42 @@
 import numpy as np
 def torch_nms(dets, thresh, mode="Union"):
    """
    greedily select boxes with high confidence
    keep boxes overlap <= thresh
    rule out overlap > thresh
    :param dets: [[x1, y1, x2, y2 score]]
    :param thresh: retain overlap <= thresh
    :return: indexes to keep
    """
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]
    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        if mode == "Union":
            ovr = inter / (areas[i] + areas[order[1:]] - inter)
        elif mode == "Minimum":
            ovr = inter / np.minimum(areas[i], areas[order[1:]])
        inds = np.where(ovr <= thresh)[0]
        order = order[inds + 1]
    return keep
--- a/src/core/roc.py
+++ b/src/core/roc.py
@@ -0,0 +1,2 @@
 import numpy as np
--- a/src/core/utils.py
+++ b/src/core/utils.py
@@ -0,0 +1,101 @@
 import numpy as np
 def IoU(box, boxes):
    """Compute IoU between detect box and gt boxes
    Parameters:
    ----------
    box: numpy array , shape (5, ): x1, y1, x2, y2, score
        input box
    boxes: numpy array, shape (n, 4): x1, y1, x2, y2
        input ground truth boxes
    Returns:
    -------
    ovr: numpy.array, shape (n, )
        IoU
    """
    box_area = (box[2] - box[0] + 1) * (box[3] - box[1] + 1)
    area = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1)
    xx1 = np.maximum(box[0], boxes[:, 0])
    yy1 = np.maximum(box[1], boxes[:, 1])
    xx2 = np.minimum(box[2], boxes[:, 2])
    yy2 = np.minimum(box[3], boxes[:, 3])
    # compute the width and height of the bounding box
    w = np.maximum(0, xx2 - xx1 + 1)
    h = np.maximum(0, yy2 - yy1 + 1)
    inter = w * h
    ovr = np.true_divide(inter,(box_area + area - inter))
    #ovr = inter / (box_area + area - inter)
    return ovr
 def convert_to_square(bbox):
    """Convert bbox to square
    Parameters:
    ----------
    bbox: numpy array , shape n x 5
        input bbox
    Returns:
    -------
    square bbox
    """
    square_bbox = bbox.copy()
    h = bbox[:, 3] - bbox[:, 1] + 1
    w = bbox[:, 2] - bbox[:, 0] + 1
    max_side = np.maximum(h,w)
    square_bbox[:, 0] = bbox[:, 0] + w*0.5 - max_side*0.5
    square_bbox[:, 1] = bbox[:, 1] + h*0.5 - max_side*0.5
    square_bbox[:, 2] = square_bbox[:, 0] + max_side - 1
    square_bbox[:, 3] = square_bbox[:, 1] + max_side - 1
    return square_bbox
 def nms(dets, thresh, mode="Union"):
    """
    greedily select boxes with high confidence
    keep boxes overlap <= thresh
    rule out overlap > thresh
    :param dets: [[x1, y1, x2, y2 score]]
    :param thresh: retain overlap <= thresh
    :return: indexes to keep
    """
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]
    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]
    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        if mode == "Union":
            ovr = inter / (areas[i] + areas[order[1:]] - inter)
        elif mode == "Minimum":
            ovr = inter / np.minimum(areas[i], areas[order[1:]])
        inds = np.where(ovr <= thresh)[0]
        order = order[inds + 1]
    return keep
--- a/src/core/vision.py
+++ b/src/core/vision.py
@@ -0,0 +1,141 @@
 from matplotlib.patches import Circle
 def vis_two(im_array, dets1, dets2, thresh=0.9):
    """Visualize detection results before and after calibration
    Parameters:
    ----------
    im_array: numpy.ndarray, shape(1, c, h, w)
        test image in rgb
    dets1: numpy.ndarray([[x1 y1 x2 y2 score]])
        detection results before calibration
    dets2: numpy.ndarray([[x1 y1 x2 y2 score]])
        detection results after calibration
    thresh: float
        boxes with scores > thresh will be drawn in red otherwise yellow
    Returns:
    -------
    """
    import matplotlib.pyplot as plt
    import random
    figure = plt.figure()
    plt.subplot(121)
    plt.imshow(im_array)
    color = 'yellow'
    for i in range(dets1.shape[0]):
        bbox = dets1[i, :4]
        landmarks = dets1[i, 5:]
        score = dets1[i, 4]
        if score > thresh:
            rect = plt.Rectangle((bbox[0], bbox[1]),
                                 bbox[2] - bbox[0],
                                 bbox[3] - bbox[1], fill=False,
                                 edgecolor='red', linewidth=0.7)
            plt.gca().add_patch(rect)
            landmarks = landmarks.reshape((5,2))
            for j in range(5):
                plt.scatter(landmarks[j,0],landmarks[j,1],c='yellow',linewidths=0.1, marker='x', s=5)
            # plt.gca().text(bbox[0], bbox[1] - 2,
            #                '{:.3f}'.format(score),
            #                bbox=dict(facecolor='blue', alpha=0.5), fontsize=12, color='white')
        # else:
        #     rect = plt.Rectangle((bbox[0], bbox[1]),
        #                          bbox[2] - bbox[0],
        #                          bbox[3] - bbox[1], fill=False,
        #                          edgecolor=color, linewidth=0.5)
        #     plt.gca().add_patch(rect)
    plt.subplot(122)
    plt.imshow(im_array)
    color = 'yellow'
    for i in range(dets2.shape[0]):
        bbox = dets2[i, :4]
        landmarks = dets1[i, 5:]
        score = dets2[i, 4]
        if score > thresh:
            rect = plt.Rectangle((bbox[0], bbox[1]),
                                 bbox[2] - bbox[0],
                                 bbox[3] - bbox[1], fill=False,
                                 edgecolor='red', linewidth=0.7)
            plt.gca().add_patch(rect)
            landmarks = landmarks.reshape((5, 2))
            for j in range(5):
                plt.scatter(landmarks[j, 0], landmarks[j, 1], c='yellow',linewidths=0.1, marker='x', s=5)
            # plt.gca().text(bbox[0], bbox[1] - 2,
            #                '{:.3f}'.format(score),
            #                bbox=dict(facecolor='blue', alpha=0.5), fontsize=12, color='white')
        # else:
        #     rect = plt.Rectangle((bbox[0], bbox[1]),
        #                          bbox[2] - bbox[0],
        #                          bbox[3] - bbox[1], fill=False,
        #                          edgecolor=color, linewidth=0.5)
        #     plt.gca().add_patch(rect)
    plt.show()
 def vis_face(im_array, dets, landmarks=None):
    """Visualize detection results before and after calibration
    Parameters:
    ----------
    im_array: numpy.ndarray, shape(1, c, h, w)
        test image in rgb
    dets1: numpy.ndarray([[x1 y1 x2 y2 score]])
        detection results before calibration
    dets2: numpy.ndarray([[x1 y1 x2 y2 score]])
        detection results after calibration
    thresh: float
        boxes with scores > thresh will be drawn in red otherwise yellow
    Returns:
    -------
    """
    import matplotlib.pyplot as plt
    import random
    import pylab
    figure = pylab.figure()
    # plt.subplot(121)
    pylab.imshow(im_array)
    figure.suptitle('DFace Detector', fontsize=20)
    for i in range(dets.shape[0]):
        bbox = dets[i, :4]
        rect = pylab.Rectangle((bbox[0], bbox[1]),
                             bbox[2] - bbox[0],
                             bbox[3] - bbox[1], fill=False,
                             edgecolor='yellow', linewidth=0.9)
        pylab.gca().add_patch(rect)
    if landmarks is not None:
        for i in range(landmarks.shape[0]):
            landmarks_one = landmarks[i, :]
            landmarks_one = landmarks_one.reshape((5, 2))
            for j in range(5):
                # pylab.scatter(landmarks_one[j, 0], landmarks_one[j, 1], c='yellow', linewidths=0.1, marker='x', s=5)
                cir1 = Circle(xy=(landmarks_one[j, 0], landmarks_one[j, 1]), radius=2, alpha=0.4, color="red")
                pylab.gca().add_patch(cir1)
                # plt.gca().text(bbox[0], bbox[1] - 2,
                #                '{:.3f}'.format(score),
                #                bbox=dict(facecolor='blue', alpha=0.5), fontsize=12, color='white')
                # else:
                #     rect = plt.Rectangle((bbox[0], bbox[1]),
                #                          bbox[2] - bbox[0],
                #                          bbox[3] - bbox[1], fill=False,
                #                          edgecolor=color, linewidth=0.5)
                #     plt.gca().add_patch(rect)
        pylab.show()
--- a/src/prepare_data/init.py
+++ b/src/prepare_data/init.py
--- a/src/prepare_data/assemble.py
+++ b/src/prepare_data/assemble.py
@@ -0,0 +1,35 @@
 import os
 import numpy.random as npr
 import numpy as np
 def assemble_data(output_file, anno_file_list=[]):
    #assemble the annotations to one file
    size = 12
    if len(anno_file_list)==0:
        return 0
    if os.path.exists(output_file):
        os.remove(output_file)
    for anno_file in anno_file_list:
        with open(anno_file, 'r') as f:
            anno_lines = f.readlines()
        base_num = 250000
        if len(anno_lines) > base_num * 3:
            idx_keep = npr.choice(len(anno_lines), size=base_num * 3, replace=True)
        elif len(anno_lines) > 100000:
            idx_keep = npr.choice(len(anno_lines), size=len(anno_lines), replace=True)
        else:
            idx_keep = np.arange(len(anno_lines))
            np.random.shuffle(idx_keep)
        chose_count = 0
        with open(output_file, 'a+') as f:
            for idx in idx_keep:
                f.write(anno_lines[idx])
                chose_count+=1
    return chose_count
--- a/src/prepare_data/assemble_onet_imglist.py
+++ b/src/prepare_data/assemble_onet_imglist.py
@@ -0,0 +1,25 @@
 import os
 import config
 import assemble as assemble
 if __name__ == '__main__':
    anno_list = []
    net_landmark_file = os.path.join(config.ANNO_STORE_DIR,config.ONET_LANDMARK_ANNO_FILENAME)
    net_postive_file = os.path.join(config.ANNO_STORE_DIR,config.ONET_POSTIVE_ANNO_FILENAME)
    net_part_file = os.path.join(config.ANNO_STORE_DIR,config.ONET_PART_ANNO_FILENAME)
    net_neg_file = os.path.join(config.ANNO_STORE_DIR,config.ONET_NEGATIVE_ANNO_FILENAME)
    anno_list.append(net_postive_file)
    anno_list.append(net_part_file)
    anno_list.append(net_neg_file)
    anno_list.append(net_landmark_file)
    imglist_filename = config.ONET_TRAIN_IMGLIST_FILENAME
    anno_dir = config.ANNO_STORE_DIR
    imglist_file = os.path.join(anno_dir, imglist_filename)
    chose_count = assemble.assemble_data(imglist_file ,anno_list)
    print "PNet train annotation result file path:%s" % imglist_file
--- a/src/prepare_data/assemble_pnet_imglist.py
+++ b/src/prepare_data/assemble_pnet_imglist.py
@@ -0,0 +1,25 @@
 import os
 import config
 import assemble as assemble
 if __name__ == '__main__':
    anno_list = []
    # pnet_landmark_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_LANDMARK_ANNO_FILENAME)
    pnet_postive_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_POSTIVE_ANNO_FILENAME)
    pnet_part_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_PART_ANNO_FILENAME)
    pnet_neg_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_NEGATIVE_ANNO_FILENAME)
    anno_list.append(pnet_postive_file)
    anno_list.append(pnet_part_file)
    anno_list.append(pnet_neg_file)
    # anno_list.append(pnet_landmark_file)
    imglist_filename = config.PNET_TRAIN_IMGLIST_FILENAME
    anno_dir = config.ANNO_STORE_DIR
    imglist_file = os.path.join(anno_dir, imglist_filename)
    chose_count = assemble.assemble_data(imglist_file ,anno_list)
    print "PNet train annotation result file path:%s" % imglist_file
--- a/src/prepare_data/assemble_rnet_imglist.py
+++ b/src/prepare_data/assemble_rnet_imglist.py
@@ -0,0 +1,25 @@
 import os
 import config
 import assemble as assemble
 if __name__ == '__main__':
    anno_list = []
    # pnet_landmark_file = os.path.join(config.ANNO_STORE_DIR,config.RNET_LANDMARK_ANNO_FILENAME)
    pnet_postive_file = os.path.join(config.ANNO_STORE_DIR,config.RNET_POSTIVE_ANNO_FILENAME)
    pnet_part_file = os.path.join(config.ANNO_STORE_DIR,config.RNET_PART_ANNO_FILENAME)
    pnet_neg_file = os.path.join(config.ANNO_STORE_DIR,config.RNET_NEGATIVE_ANNO_FILENAME)
    anno_list.append(pnet_postive_file)
    anno_list.append(pnet_part_file)
    anno_list.append(pnet_neg_file)
    # anno_list.append(pnet_landmark_file)
    imglist_filename = config.RNET_TRAIN_IMGLIST_FILENAME
    anno_dir = config.ANNO_STORE_DIR
    imglist_file = os.path.join(anno_dir, imglist_filename)
    chose_count = assemble.assemble_data(imglist_file ,anno_list)
    print "PNet train annotation result file path:%s" % imglist_file
--- a/src/prepare_data/gen_Onet_train_data.py
+++ b/src/prepare_data/gen_Onet_train_data.py
@@ -0,0 +1,220 @@
 import argparse
 import cv2
 import numpy as np
 from core.detect import MtcnnDetector,create_mtcnn_net
 from core.imagedb import ImageDB
 from core.image_reader import TestImageLoader
 import time
 import os
 import cPickle
 from core.utils import convert_to_square,IoU
 import config
 import core.vision as vision
 def gen_onet_data(data_dir, anno_file, pnet_model_file, rnet_model_file, prefix_path='', use_cuda=True, vis=False):
    pnet, rnet, _ = create_mtcnn_net(p_model_path=pnet_model_file, r_model_path=rnet_model_file, use_cuda=use_cuda)
    mtcnn_detector = MtcnnDetector(pnet=pnet, rnet=rnet, min_face_size=12)
    imagedb = ImageDB(anno_file,mode="test",prefix_path=prefix_path)
    imdb = imagedb.load_imdb()
    image_reader = TestImageLoader(imdb,1,False)
    all_boxes = list()
    batch_idx = 0
    for databatch in image_reader:
        if batch_idx % 100 == 0:
            print "%d images done" % batch_idx
        im = databatch
        t = time.time()
        p_boxes, p_boxes_align = mtcnn_detector.detect_pnet(im=im)
        boxes, boxes_align = mtcnn_detector.detect_rnet(im=im, dets=p_boxes_align)
        if boxes_align is None:
            all_boxes.append(np.array([]))
            batch_idx += 1
            continue
        if vis:
            rgb_im = cv2.cvtColor(np.asarray(im), cv2.COLOR_BGR2RGB)
            vision.vis_two(rgb_im, boxes, boxes_align)
        t1 = time.time() - t
        t = time.time()
        all_boxes.append(boxes_align)
        batch_idx += 1
    save_path = config.MODEL_STORE_DIR
    if not os.path.exists(save_path):
        os.mkdir(save_path)
    save_file = os.path.join(save_path, "detections_%d.pkl" % int(time.time()))
    with open(save_file, 'wb') as f:
        cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL)
    gen_onet_sample_data(data_dir,anno_file,save_file)
 def gen_onet_sample_data(data_dir,anno_file,det_boxs_file):
    neg_save_dir = os.path.join(data_dir, "48/negative")
    pos_save_dir = os.path.join(data_dir, "48/positive")
    part_save_dir = os.path.join(data_dir, "48/part")
    for dir_path in [neg_save_dir, pos_save_dir, part_save_dir]:
        if not os.path.exists(dir_path):
            os.makedirs(dir_path)
    # load ground truth from annotation file
    # format of each line: image/path [x1,y1,x2,y2] for each gt_box in this image
    with open(anno_file, 'r') as f:
        annotations = f.readlines()
    image_size = 48
    net = "onet"
    im_idx_list = list()
    gt_boxes_list = list()
    num_of_images = len(annotations)
    print "processing %d images in total" % num_of_images
    for annotation in annotations:
        annotation = annotation.strip().split(' ')
        im_idx = annotation[0]
        boxes = map(float, annotation[1:])
        boxes = np.array(boxes, dtype=np.float32).reshape(-1, 4)
        im_idx_list.append(im_idx)
        gt_boxes_list.append(boxes)
    save_path = config.ANNO_STORE_DIR
    if not os.path.exists(save_path):
        os.makedirs(save_path)
    f1 = open(os.path.join(save_path, 'pos_%d.txt' % image_size), 'w')
    f2 = open(os.path.join(save_path, 'neg_%d.txt' % image_size), 'w')
    f3 = open(os.path.join(save_path, 'part_%d.txt' % image_size), 'w')
    det_handle = open(det_boxs_file, 'r')
    det_boxes = cPickle.load(det_handle)
    print len(det_boxes), num_of_images
    assert len(det_boxes) == num_of_images, "incorrect detections or ground truths"
    # index of neg, pos and part face, used as their image names
    n_idx = 0
    p_idx = 0
    d_idx = 0
    image_done = 0
    for im_idx, dets, gts in zip(im_idx_list, det_boxes, gt_boxes_list):
        if image_done % 100 == 0:
            print "%d images done" % image_done
        image_done += 1
        if dets.shape[0] == 0:
            continue
        img = cv2.imread(im_idx)
        dets = convert_to_square(dets)
        dets[:, 0:4] = np.round(dets[:, 0:4])
        for box in dets:
            x_left, y_top, x_right, y_bottom = box[0:4].astype(int)
            width = x_right - x_left + 1
            height = y_bottom - y_top + 1
            # ignore box that is too small or beyond image border
            if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1:
                continue
            # compute intersection over union(IoU) between current box and all gt boxes
            Iou = IoU(box, gts)
            cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :]
            resized_im = cv2.resize(cropped_im, (image_size, image_size),
                                    interpolation=cv2.INTER_LINEAR)
            # save negative images and write label
            if np.max(Iou) < 0.3:
                # Iou with all gts must below 0.3
                save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx)
                f2.write(save_file + ' 0\n')
                cv2.imwrite(save_file, resized_im)
                n_idx += 1
            else:
                # find gt_box with the highest iou
                idx = np.argmax(Iou)
                assigned_gt = gts[idx]
                x1, y1, x2, y2 = assigned_gt
                # compute bbox reg label
                offset_x1 = (x1 - x_left) / float(width)
                offset_y1 = (y1 - y_top) / float(height)
                offset_x2 = (x2 - x_right) / float(width)
                offset_y2 = (y2 - y_bottom) / float(height)
                # save positive and part-face images and write labels
                if np.max(Iou) >= 0.65:
                    save_file = os.path.join(pos_save_dir, "%s.jpg" % p_idx)
                    f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % (
                    offset_x1, offset_y1, offset_x2, offset_y2))
                    cv2.imwrite(save_file, resized_im)
                    p_idx += 1
                elif np.max(Iou) >= 0.4:
                    save_file = os.path.join(part_save_dir, "%s.jpg" % d_idx)
                    f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % (
                    offset_x1, offset_y1, offset_x2, offset_y2))
                    cv2.imwrite(save_file, resized_im)
                    d_idx += 1
    f1.close()
    f2.close()
    f3.close()
 def model_store_path():
    return os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))+"/model_store"
 def parse_args():
    parser = argparse.ArgumentParser(description='Test mtcnn',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder',
                        default='../data/wider/', type=str)
    parser.add_argument('--anno_file', dest='annotation_file', help='output data folder',
                        default='../data/wider/anno.txt', type=str)
    parser.add_argument('--pmodel_file', dest='pnet_model_file', help='PNet model file path',
                        default='/idata/workspace/mtcnn/model_store/pnet_epoch_5best.pt', type=str)
    parser.add_argument('--rmodel_file', dest='rnet_model_file', help='RNet model file path',
                        default='/idata/workspace/mtcnn/model_store/rnet_epoch_1.pt', type=str)
    parser.add_argument('--gpu', dest='use_cuda', help='with gpu',
                        default=config.USE_CUDA, type=bool)
    parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path',
                        default='', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    gen_onet_data(args.dataset_path, args.annotation_file, args.pnet_model_file, args.rnet_model_file, args.prefix_path, args.use_cuda)
--- a/src/prepare_data/gen_Pnet_train_data.py
+++ b/src/prepare_data/gen_Pnet_train_data.py
@@ -0,0 +1,174 @@
 import argparse
 import numpy as np
 import cv2
 import os
 import numpy.random as npr
 from core.utils import IoU
 import config
 def gen_pnet_data(data_dir,anno_file):
    neg_save_dir =  os.path.join(data_dir,"12/negative")
    pos_save_dir =  os.path.join(data_dir,"12/positive")
    part_save_dir = os.path.join(data_dir,"12/part")
    for dir_path in [neg_save_dir,pos_save_dir,part_save_dir]:
        if not os.path.exists(dir_path):
            os.makedirs(dir_path)
    save_dir = os.path.join(data_dir,"pnet")
    if not os.path.exists(save_dir):
        os.mkdir(save_dir)
    post_save_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_POSTIVE_ANNO_FILENAME)
    neg_save_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_NEGATIVE_ANNO_FILENAME)
    part_save_file = os.path.join(config.ANNO_STORE_DIR,config.PNET_PART_ANNO_FILENAME)
    f1 = open(post_save_file, 'w')
    f2 = open(neg_save_file, 'w')
    f3 = open(part_save_file, 'w')
    with open(anno_file, 'r') as f:
        annotations = f.readlines()
    num = len(annotations)
    print "%d pics in total" % num
    p_idx = 0
    n_idx = 0
    d_idx = 0
    idx = 0
    box_idx = 0
    for annotation in annotations:
        annotation = annotation.strip().split(' ')
        im_path = annotation[0]
        bbox = map(float, annotation[1:])
        boxes = np.array(bbox, dtype=np.int32).reshape(-1, 4)
        img = cv2.imread(im_path)
        idx += 1
        if idx % 100 == 0:
            print idx, "images done"
        height, width, channel = img.shape
        neg_num = 0
        while neg_num < 50:
            size = npr.randint(12, min(width, height) / 2)
            nx = npr.randint(0, width - size)
            ny = npr.randint(0, height - size)
            crop_box = np.array([nx, ny, nx + size, ny + size])
            Iou = IoU(crop_box, boxes)
            cropped_im = img[ny : ny + size, nx : nx + size, :]
            resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
            if np.max(Iou) < 0.3:
                # Iou with all gts must below 0.3
                save_file = os.path.join(neg_save_dir, "%s.jpg"%n_idx)
                f2.write(save_file + ' 0\n')
                cv2.imwrite(save_file, resized_im)
                n_idx += 1
                neg_num += 1
        for box in boxes:
            # box (x_left, y_top, x_right, y_bottom)
            x1, y1, x2, y2 = box
            w = x2 - x1 + 1
            h = y2 - y1 + 1
            # ignore small faces
            # in case the ground truth boxes of small faces are not accurate
            if max(w, h) < 40 or x1 < 0 or y1 < 0:
                continue
            # generate negative examples that have overlap with gt
            for i in range(5):
                size = npr.randint(12,  min(width, height) / 2)
                # delta_x and delta_y are offsets of (x1, y1)
                delta_x = npr.randint(max(-size, -x1), w)
                delta_y = npr.randint(max(-size, -y1), h)
                nx1 = max(0, x1 + delta_x)
                ny1 = max(0, y1 + delta_y)
                if nx1 + size > width or ny1 + size > height:
                    continue
                crop_box = np.array([nx1, ny1, nx1 + size, ny1 + size])
                Iou = IoU(crop_box, boxes)
                cropped_im = img[ny1 : ny1 + size, nx1 : nx1 + size, :]
                resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
                if np.max(Iou) < 0.3:
                    # Iou with all gts must below 0.3
                    save_file = os.path.join(neg_save_dir, "%s.jpg"%n_idx)
                    f2.write(save_file + ' 0\n')
                    cv2.imwrite(save_file, resized_im)
                    n_idx += 1
            # generate positive examples and part faces
            for i in range(20):
                size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))
                # delta here is the offset of box center
                delta_x = npr.randint(-w * 0.2, w * 0.2)
                delta_y = npr.randint(-h * 0.2, h * 0.2)
                nx1 = max(x1 + w / 2 + delta_x - size / 2, 0)
                ny1 = max(y1 + h / 2 + delta_y - size / 2, 0)
                nx2 = nx1 + size
                ny2 = ny1 + size
                if nx2 > width or ny2 > height:
                    continue
                crop_box = np.array([nx1, ny1, nx2, ny2])
                offset_x1 = (x1 - nx1) / float(size)
                offset_y1 = (y1 - ny1) / float(size)
                offset_x2 = (x2 - nx2) / float(size)
                offset_y2 = (y2 - ny2) / float(size)
                cropped_im = img[ny1 : ny2, nx1 : nx2, :]
                resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
                box_ = box.reshape(1, -1)
                if IoU(crop_box, box_) >= 0.65:
                    save_file = os.path.join(pos_save_dir, "%s.jpg"%p_idx)
                    f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n'%(offset_x1, offset_y1, offset_x2, offset_y2))
                    cv2.imwrite(save_file, resized_im)
                    p_idx += 1
                elif IoU(crop_box, box_) >= 0.4:
                    save_file = os.path.join(part_save_dir, "%s.jpg"%d_idx)
                    f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n'%(offset_x1, offset_y1, offset_x2, offset_y2))
                    cv2.imwrite(save_file, resized_im)
                    d_idx += 1
            box_idx += 1
            print "%s images done, pos: %s part: %s neg: %s"%(idx, p_idx, d_idx, n_idx)
    f1.close()
    f2.close()
    f3.close()
 def parse_args():
    parser = argparse.ArgumentParser(description='Test mtcnn',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder',
                        default='../data/wider/', type=str)
    parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file',
                        default='../data/wider/anno.txt', type=str)
    parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path',
                        default='', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    gen_pnet_data(args.dataset_path,args.annotation_file)
--- a/src/prepare_data/gen_Rnet_train_data.py
+++ b/src/prepare_data/gen_Rnet_train_data.py
@@ -0,0 +1,219 @@
 import argparse
 import cv2
 import numpy as np
 from core.detect import MtcnnDetector,create_mtcnn_net
 from core.imagedb import ImageDB
 from core.image_reader import TestImageLoader
 import time
 import os
 import cPickle
 from core.utils import convert_to_square,IoU
 import config
 import core.vision as vision
 def gen_rnet_data(data_dir, anno_file, pnet_model_file, prefix_path='', use_cuda=True, vis=False):
    pnet, _, _ = create_mtcnn_net(p_model_path=pnet_model_file, use_cuda=use_cuda)
    mtcnn_detector = MtcnnDetector(pnet=pnet,min_face_size=12)
    imagedb = ImageDB(anno_file,mode="test",prefix_path=prefix_path)
    imdb = imagedb.load_imdb()
    image_reader = TestImageLoader(imdb,1,False)
    all_boxes = list()
    batch_idx = 0
    for databatch in image_reader:
        if batch_idx % 100 == 0:
            print "%d images done" % batch_idx
        im = databatch
        t = time.time()
        boxes, boxes_align = mtcnn_detector.detect_pnet(im=im)
        if boxes_align is None:
            all_boxes.append(np.array([]))
            batch_idx += 1
            continue
        if vis:
            rgb_im = cv2.cvtColor(np.asarray(im), cv2.COLOR_BGR2RGB)
            vision.vis_two(rgb_im, boxes, boxes_align)
        t1 = time.time() - t
        t = time.time()
        all_boxes.append(boxes_align)
        batch_idx += 1
    # save_path = model_store_path()
    save_path = config.MODEL_STORE_DIR
    if not os.path.exists(save_path):
        os.mkdir(save_path)
    save_file = os.path.join(save_path, "detections_%d.pkl" % int(time.time()))
    with open(save_file, 'wb') as f:
        cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL)
    gen_rnet_sample_data(data_dir,anno_file,save_file)
 def gen_rnet_sample_data(data_dir,anno_file,det_boxs_file):
    neg_save_dir = os.path.join(data_dir, "24/negative")
    pos_save_dir = os.path.join(data_dir, "24/positive")
    part_save_dir = os.path.join(data_dir, "24/part")
    for dir_path in [neg_save_dir, pos_save_dir, part_save_dir]:
        if not os.path.exists(dir_path):
            os.makedirs(dir_path)
    # load ground truth from annotation file
    # format of each line: image/path [x1,y1,x2,y2] for each gt_box in this image
    with open(anno_file, 'r') as f:
        annotations = f.readlines()
    image_size = 24
    net = "rnet"
    im_idx_list = list()
    gt_boxes_list = list()
    num_of_images = len(annotations)
    print "processing %d images in total" % num_of_images
    for annotation in annotations:
        annotation = annotation.strip().split(' ')
        im_idx = annotation[0]
        boxes = map(float, annotation[1:])
        boxes = np.array(boxes, dtype=np.float32).reshape(-1, 4)
        im_idx_list.append(im_idx)
        gt_boxes_list.append(boxes)
    save_path = config.ANNO_STORE_DIR
    if not os.path.exists(save_path):
        os.makedirs(save_path)
    f1 = open(os.path.join(save_path, 'pos_%d.txt' % image_size), 'w')
    f2 = open(os.path.join(save_path, 'neg_%d.txt' % image_size), 'w')
    f3 = open(os.path.join(save_path, 'part_%d.txt' % image_size), 'w')
    det_handle = open(det_boxs_file, 'r')
    det_boxes = cPickle.load(det_handle)
    print len(det_boxes), num_of_images
    assert len(det_boxes) == num_of_images, "incorrect detections or ground truths"
    # index of neg, pos and part face, used as their image names
    n_idx = 0
    p_idx = 0
    d_idx = 0
    image_done = 0
    for im_idx, dets, gts in zip(im_idx_list, det_boxes, gt_boxes_list):
        if image_done % 100 == 0:
            print "%d images done" % image_done
        image_done += 1
        if dets.shape[0] == 0:
            continue
        img = cv2.imread(im_idx)
        dets = convert_to_square(dets)
        dets[:, 0:4] = np.round(dets[:, 0:4])
        for box in dets:
            x_left, y_top, x_right, y_bottom = box[0:4].astype(int)
            width = x_right - x_left + 1
            height = y_bottom - y_top + 1
            # ignore box that is too small or beyond image border
            if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1:
                continue
            # compute intersection over union(IoU) between current box and all gt boxes
            Iou = IoU(box, gts)
            cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :]
            resized_im = cv2.resize(cropped_im, (image_size, image_size),
                                    interpolation=cv2.INTER_LINEAR)
            # save negative images and write label
            if np.max(Iou) < 0.3:
                # Iou with all gts must below 0.3
                save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx)
                f2.write(save_file + ' 0\n')
                cv2.imwrite(save_file, resized_im)
                n_idx += 1
            else:
                # find gt_box with the highest iou
                idx = np.argmax(Iou)
                assigned_gt = gts[idx]
                x1, y1, x2, y2 = assigned_gt
                # compute bbox reg label
                offset_x1 = (x1 - x_left) / float(width)
                offset_y1 = (y1 - y_top) / float(height)
                offset_x2 = (x2 - x_right) / float(width)
                offset_y2 = (y2 - y_bottom) / float(height)
                # save positive and part-face images and write labels
                if np.max(Iou) >= 0.65:
                    save_file = os.path.join(pos_save_dir, "%s.jpg" % p_idx)
                    f1.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % (
                    offset_x1, offset_y1, offset_x2, offset_y2))
                    cv2.imwrite(save_file, resized_im)
                    p_idx += 1
                elif np.max(Iou) >= 0.4:
                    save_file = os.path.join(part_save_dir, "%s.jpg" % d_idx)
                    f3.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % (
                    offset_x1, offset_y1, offset_x2, offset_y2))
                    cv2.imwrite(save_file, resized_im)
                    d_idx += 1
    f1.close()
    f2.close()
    f3.close()
 def model_store_path():
    return os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))+"/model_store"
 def parse_args():
    parser = argparse.ArgumentParser(description='Test mtcnn',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder',
                        default='../data/wider/', type=str)
    parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file',
                        default='../data/wider/anno.txt', type=str)
    parser.add_argument('--pmodel_file', dest='pnet_model_file', help='PNet model file path',
                        default='/idata/workspace/mtcnn/model_store/pnet_epoch_5best.pt', type=str)
    parser.add_argument('--gpu', dest='use_cuda', help='with gpu',
                        default=config.USE_CUDA, type=bool)
    parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path',
                        default='', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    gen_rnet_data(args.dataset_path, args.annotation_file, args.pnet_model_file, args.prefix_path, args.use_cuda)
--- a/src/prepare_data/gen_landmark_12.py
+++ b/src/prepare_data/gen_landmark_12.py
@@ -0,0 +1,156 @@
 # coding: utf-8
 import os
 import cv2
 import numpy as np
 import sys
 import numpy.random as npr
 import argparse
 import config
 import core.utils as utils
 def gen_data(anno_file, data_dir, prefix):
    size = 12
    image_id = 0
    landmark_imgs_save_dir = os.path.join(data_dir,"12/landmark")
    if not os.path.exists(landmark_imgs_save_dir):
        os.makedirs(landmark_imgs_save_dir)
    anno_dir = config.ANNO_STORE_DIR
    if not os.path.exists(anno_dir):
        os.makedirs(anno_dir)
    landmark_anno_filename = config.PNET_LANDMARK_ANNO_FILENAME
    save_landmark_anno = os.path.join(anno_dir,landmark_anno_filename)
    f = open(save_landmark_anno, 'w')
    # dstdir = "train_landmark_few"
    with open(anno_file, 'r') as f2:
        annotations = f2.readlines()
    num = len(annotations)
    print "%d pics in total" % num
    l_idx =0
    idx = 0
    # image_path bbox landmark(5*2)
    for annotation in annotations:
        # print imgPath
        annotation = annotation.strip().split(' ')
        assert len(annotation)==15,"each line should have 15 element"
        im_path = os.path.join(prefix,annotation[0].replace("\\", "/"))
        gt_box = map(float, annotation[1:5])
        gt_box = [gt_box[0], gt_box[2], gt_box[1], gt_box[3]]
        gt_box = np.array(gt_box, dtype=np.int32)
        landmark =  bbox = map(float, annotation[5:])
        landmark = np.array(landmark, dtype=np.float)
        img = cv2.imread(im_path)
        assert (img is not None)
        height, width, channel = img.shape
        # crop_face = img[gt_box[1]:gt_box[3]+1, gt_box[0]:gt_box[2]+1]
        # crop_face = cv2.resize(crop_face,(size,size))
        idx = idx + 1
        if idx % 100 == 0:
            print "%d images done, landmark images: %d"%(idx,l_idx)
        x1, y1, x2, y2 = gt_box
        # gt's width
        w = x2 - x1 + 1
        # gt's height
        h = y2 - y1 + 1
        if max(w, h) < 40 or x1 < 0 or y1 < 0:
            continue
        # random shift
        for i in range(10):
            bbox_size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))
            delta_x = npr.randint(-w * 0.2, w * 0.2)
            delta_y = npr.randint(-h * 0.2, h * 0.2)
            nx1 = max(x1 + w / 2 - bbox_size / 2 + delta_x, 0)
            ny1 = max(y1 + h / 2 - bbox_size / 2 + delta_y, 0)
            nx2 = nx1 + bbox_size
            ny2 = ny1 + bbox_size
            if nx2 > width or ny2 > height:
                continue
            crop_box = np.array([nx1, ny1, nx2, ny2])
            cropped_im = img[ny1:ny2 + 1, nx1:nx2 + 1, :]
            resized_im = cv2.resize(cropped_im, (size, size),interpolation=cv2.INTER_LINEAR)
            offset_x1 = (x1 - nx1) / float(bbox_size)
            offset_y1 = (y1 - ny1) / float(bbox_size)
            offset_x2 = (x2 - nx2) / float(bbox_size)
            offset_y2 = (y2 - ny2) / float(bbox_size)
            offset_left_eye_x = (landmark[0] - nx1) / float(bbox_size)
            offset_left_eye_y = (landmark[1] - ny1) / float(bbox_size)
            offset_right_eye_x = (landmark[2] - nx1) / float(bbox_size)
            offset_right_eye_y = (landmark[3] - ny1) / float(bbox_size)
            offset_nose_x = (landmark[4] - nx1) / float(bbox_size)
            offset_nose_y = (landmark[5] - ny1) / float(bbox_size)
            offset_left_mouth_x = (landmark[6] - nx1) / float(bbox_size)
            offset_left_mouth_y = (landmark[7] - ny1) / float(bbox_size)
            offset_right_mouth_x = (landmark[8] - nx1) / float(bbox_size)
            offset_right_mouth_y = (landmark[9] - ny1) / float(bbox_size)
            # cal iou
            iou = utils.IoU(crop_box.astype(np.float), np.expand_dims(gt_box.astype(np.float), 0))
            if iou > 0.65:
                save_file = os.path.join(landmark_imgs_save_dir, "%s.jpg" % l_idx)
                cv2.imwrite(save_file, resized_im)
                f.write(save_file + ' -2 %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f \n' % \
                (offset_x1, offset_y1, offset_x2, offset_y2, \
                offset_left_eye_x,offset_left_eye_y,offset_right_eye_x,offset_right_eye_y,offset_nose_x,offset_nose_y,offset_left_mouth_x,offset_left_mouth_y,offset_right_mouth_x,offset_right_mouth_y))
                l_idx += 1
    f.close()
 def parse_args():
    parser = argparse.ArgumentParser(description='Test mtcnn',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder',
                        default='../data/wider/', type=str)
    parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file',
                        default='../data/wider/anno.txt', type=str)
    parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path',
                        default='../data/', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    gen_data(args.annotation_file, args.dataset_path, args.prefix_path)
--- a/src/prepare_data/gen_landmark_24.py
+++ b/src/prepare_data/gen_landmark_24.py
@@ -0,0 +1,154 @@
 # coding: utf-8
 import os
 import cv2
 import numpy as np
 import random
 import sys
 import numpy.random as npr
 import argparse
 import config
 import core.utils as utils
 def gen_data(anno_file, data_dir, prefix):
    size = 24
    image_id = 0
    landmark_imgs_save_dir = os.path.join(data_dir,"24/landmark")
    if not os.path.exists(landmark_imgs_save_dir):
        os.makedirs(landmark_imgs_save_dir)
    anno_dir = config.ANNO_STORE_DIR
    if not os.path.exists(anno_dir):
        os.makedirs(anno_dir)
    landmark_anno_filename = config.RNET_LANDMARK_ANNO_FILENAME
    save_landmark_anno = os.path.join(anno_dir,landmark_anno_filename)
    f = open(save_landmark_anno, 'w')
    # dstdir = "train_landmark_few"
    with open(anno_file, 'r') as f2:
        annotations = f2.readlines()
    num = len(annotations)
    print "%d total images" % num
    l_idx =0
    idx = 0
    # image_path bbox landmark(5*2)
    for annotation in annotations:
        # print imgPath
        annotation = annotation.strip().split(' ')
        assert len(annotation)==15,"each line should have 15 element"
        im_path = os.path.join(prefix,annotation[0].replace("\\", "/"))
        gt_box = map(float, annotation[1:5])
        gt_box = [gt_box[0], gt_box[2], gt_box[1], gt_box[3]]
        gt_box = np.array(gt_box, dtype=np.int32)
        landmark = map(float, annotation[5:])
        landmark = np.array(landmark, dtype=np.float)
        img = cv2.imread(im_path)
        assert (img is not None)
        height, width, channel = img.shape
        # crop_face = img[gt_box[1]:gt_box[3]+1, gt_box[0]:gt_box[2]+1]
        # crop_face = cv2.resize(crop_face,(size,size))
        idx = idx + 1
        if idx % 100 == 0:
            print "%d images done, landmark images: %d"%(idx,l_idx)
        x1, y1, x2, y2 = gt_box
        # gt's width
        w = x2 - x1 + 1
        # gt's height
        h = y2 - y1 + 1
        if max(w, h) < 40 or x1 < 0 or y1 < 0:
            continue
        # random shift
        for i in range(10):
            bbox_size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))
            delta_x = npr.randint(-w * 0.2, w * 0.2)
            delta_y = npr.randint(-h * 0.2, h * 0.2)
            nx1 = max(x1 + w / 2 - bbox_size / 2 + delta_x, 0)
            ny1 = max(y1 + h / 2 - bbox_size / 2 + delta_y, 0)
            nx2 = nx1 + bbox_size
            ny2 = ny1 + bbox_size
            if nx2 > width or ny2 > height:
                continue
            crop_box = np.array([nx1, ny1, nx2, ny2])
            cropped_im = img[ny1:ny2 + 1, nx1:nx2 + 1, :]
            resized_im = cv2.resize(cropped_im, (size, size),interpolation=cv2.INTER_LINEAR)
            offset_x1 = (x1 - nx1) / float(bbox_size)
            offset_y1 = (y1 - ny1) / float(bbox_size)
            offset_x2 = (x2 - nx2) / float(bbox_size)
            offset_y2 = (y2 - ny2) / float(bbox_size)
            offset_left_eye_x = (landmark[0] - nx1) / float(bbox_size)
            offset_left_eye_y = (landmark[1] - ny1) / float(bbox_size)
            offset_right_eye_x = (landmark[2] - nx1) / float(bbox_size)
            offset_right_eye_y = (landmark[3] - ny1) / float(bbox_size)
            offset_nose_x = (landmark[4] - nx1) / float(bbox_size)
            offset_nose_y = (landmark[5] - ny1) / float(bbox_size)
            offset_left_mouth_x = (landmark[6] - nx1) / float(bbox_size)
            offset_left_mouth_y = (landmark[7] - ny1) / float(bbox_size)
            offset_right_mouth_x = (landmark[8] - nx1) / float(bbox_size)
            offset_right_mouth_y = (landmark[9] - ny1) / float(bbox_size)
            # cal iou
            iou = utils.IoU(crop_box.astype(np.float), np.expand_dims(gt_box.astype(np.float), 0))
            if iou > 0.65:
                save_file = os.path.join(landmark_imgs_save_dir, "%s.jpg" % l_idx)
                cv2.imwrite(save_file, resized_im)
                f.write(save_file + ' -2 %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f \n' % \
                (offset_x1, offset_y1, offset_x2, offset_y2, \
                offset_left_eye_x,offset_left_eye_y,offset_right_eye_x,offset_right_eye_y,offset_nose_x,offset_nose_y,offset_left_mouth_x,offset_left_mouth_y,offset_right_mouth_x,offset_right_mouth_y))
                l_idx += 1
    f.close()
 def parse_args():
    parser = argparse.ArgumentParser(description='Test mtcnn',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder',
                        default='/idata/data/wider/', type=str)
    parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file',
                        default='/idata/data/trainImageList.txt', type=str)
    parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path',
                        default='/idata/data', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    gen_data(args.annotation_file, args.dataset_path, args.prefix_path)
--- a/src/prepare_data/gen_landmark_48.py
+++ b/src/prepare_data/gen_landmark_48.py
@@ -0,0 +1,153 @@
 # coding: utf-8
 import os
 import cv2
 import numpy as np
 import random
 import sys
 import numpy.random as npr
 import argparse
 import config
 import core.utils as utils
 def gen_data(anno_file, data_dir, prefix):
    size = 48
    image_id = 0
    landmark_imgs_save_dir = os.path.join(data_dir,"48/landmark")
    if not os.path.exists(landmark_imgs_save_dir):
        os.makedirs(landmark_imgs_save_dir)
    anno_dir = config.ANNO_STORE_DIR
    if not os.path.exists(anno_dir):
        os.makedirs(anno_dir)
    landmark_anno_filename = config.ONET_LANDMARK_ANNO_FILENAME
    save_landmark_anno = os.path.join(anno_dir,landmark_anno_filename)
    f = open(save_landmark_anno, 'w')
    # dstdir = "train_landmark_few"
    with open(anno_file, 'r') as f2:
        annotations = f2.readlines()
    num = len(annotations)
    print "%d total images" % num
    l_idx =0
    idx = 0
    # image_path bbox landmark(5*2)
    for annotation in annotations:
        # print imgPath
        annotation = annotation.strip().split(' ')
        assert len(annotation)==15,"each line should have 15 element"
        im_path = os.path.join(prefix,annotation[0].replace("\\", "/"))
        gt_box = map(float, annotation[1:5])
        # gt_box = [gt_box[0], gt_box[2], gt_box[1], gt_box[3]]
        gt_box = np.array(gt_box, dtype=np.int32)
        landmark = map(float, annotation[5:])
        landmark = np.array(landmark, dtype=np.float)
        img = cv2.imread(im_path)
        assert (img is not None)
        height, width, channel = img.shape
        # crop_face = img[gt_box[1]:gt_box[3]+1, gt_box[0]:gt_box[2]+1]
        # crop_face = cv2.resize(crop_face,(size,size))
        idx = idx + 1
        if idx % 100 == 0:
            print "%d images done, landmark images: %d"%(idx,l_idx)
        x1, y1, x2, y2 = gt_box
        # gt's width
        w = x2 - x1 + 1
        # gt's height
        h = y2 - y1 + 1
        if max(w, h) < 40 or x1 < 0 or y1 < 0:
            continue
        # random shift
        for i in range(10):
            bbox_size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))
            delta_x = npr.randint(-w * 0.2, w * 0.2)
            delta_y = npr.randint(-h * 0.2, h * 0.2)
            nx1 = max(x1 + w / 2 - bbox_size / 2 + delta_x, 0)
            ny1 = max(y1 + h / 2 - bbox_size / 2 + delta_y, 0)
            nx2 = nx1 + bbox_size
            ny2 = ny1 + bbox_size
            if nx2 > width or ny2 > height:
                continue
            crop_box = np.array([nx1, ny1, nx2, ny2])
            cropped_im = img[ny1:ny2 + 1, nx1:nx2 + 1, :]
            resized_im = cv2.resize(cropped_im, (size, size),interpolation=cv2.INTER_LINEAR)
            offset_x1 = (x1 - nx1) / float(bbox_size)
            offset_y1 = (y1 - ny1) / float(bbox_size)
            offset_x2 = (x2 - nx2) / float(bbox_size)
            offset_y2 = (y2 - ny2) / float(bbox_size)
            offset_left_eye_x = (landmark[0] - nx1) / float(bbox_size)
            offset_left_eye_y = (landmark[1] - ny1) / float(bbox_size)
            offset_right_eye_x = (landmark[2] - nx1) / float(bbox_size)
            offset_right_eye_y = (landmark[3] - ny1) / float(bbox_size)
            offset_nose_x = (landmark[4] - nx1) / float(bbox_size)
            offset_nose_y = (landmark[5] - ny1) / float(bbox_size)
            offset_left_mouth_x = (landmark[6] - nx1) / float(bbox_size)
            offset_left_mouth_y = (landmark[7] - ny1) / float(bbox_size)
            offset_right_mouth_x = (landmark[8] - nx1) / float(bbox_size)
            offset_right_mouth_y = (landmark[9] - ny1) / float(bbox_size)
            # cal iou
            iou = utils.IoU(crop_box.astype(np.float), np.expand_dims(gt_box.astype(np.float), 0))
            if iou > 0.65:
                save_file = os.path.join(landmark_imgs_save_dir, "%s.jpg" % l_idx)
                cv2.imwrite(save_file, resized_im)
                f.write(save_file + ' -2 %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f \n' % \
                (offset_x1, offset_y1, offset_x2, offset_y2, \
                offset_left_eye_x,offset_left_eye_y,offset_right_eye_x,offset_right_eye_y,offset_nose_x,offset_nose_y,offset_left_mouth_x,offset_left_mouth_y,offset_right_mouth_x,offset_right_mouth_y))
                l_idx += 1
    f.close()
 def parse_args():
    parser = argparse.ArgumentParser(description='Test mtcnn',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder',
                        default='/idata/data/wider/', type=str)
    parser.add_argument('--anno_file', dest='annotation_file', help='dataset original annotation file',
                        default='/idata/data/trainImageList.txt', type=str)
    parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path',
                        default='/idata/data', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    gen_data(args.annotation_file, args.dataset_path, args.prefix_path)
--- a/src/prepare_data/gen_landmark_net_48.py
+++ b/src/prepare_data/gen_landmark_net_48.py
@@ -0,0 +1,234 @@
 import argparse
 import cv2
 import numpy as np
 from core.detect import MtcnnDetector,create_mtcnn_net
 from core.imagedb import ImageDB
 from core.image_reader import TestImageLoader
 import time
 import os
 import cPickle
 from core.utils import convert_to_square,IoU
 import config
 import core.vision as vision
 def gen_landmark48_data(data_dir, anno_file, pnet_model_file, rnet_model_file, prefix_path='', use_cuda=True, vis=False):
    pnet, rnet, _ = create_mtcnn_net(p_model_path=pnet_model_file, r_model_path=rnet_model_file, use_cuda=use_cuda)
    mtcnn_detector = MtcnnDetector(pnet=pnet, rnet=rnet, min_face_size=12)
    imagedb = ImageDB(anno_file,mode="test",prefix_path=prefix_path)
    imdb = imagedb.load_imdb()
    image_reader = TestImageLoader(imdb,1,False)
    all_boxes = list()
    batch_idx = 0
    for databatch in image_reader:
        if batch_idx % 100 == 0:
            print "%d images done" % batch_idx
        im = databatch
        if im.shape[0] >= 1200 or im.shape[1] >=1200:
            all_boxes.append(np.array([]))
            batch_idx += 1
            continue
        t = time.time()
        p_boxes, p_boxes_align = mtcnn_detector.detect_pnet(im=im)
        boxes, boxes_align = mtcnn_detector.detect_rnet(im=im, dets=p_boxes_align)
        if boxes_align is None:
            all_boxes.append(np.array([]))
            batch_idx += 1
            continue
        if vis:
            rgb_im = cv2.cvtColor(np.asarray(im), cv2.COLOR_BGR2RGB)
            vision.vis_two(rgb_im, boxes, boxes_align)
        t1 = time.time() - t
        t = time.time()
        all_boxes.append(boxes_align)
        batch_idx += 1
    save_path = config.MODEL_STORE_DIR
    if not os.path.exists(save_path):
        os.mkdir(save_path)
    save_file = os.path.join(save_path, "detections_%d.pkl" % int(time.time()))
    with open(save_file, 'wb') as f:
        cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL)
    gen_sample_data(data_dir,anno_file,save_file, prefix_path)
 def gen_sample_data(data_dir, anno_file, det_boxs_file, prefix_path =''):
    landmark_save_dir = os.path.join(data_dir, "48/landmark")
    if not os.path.exists(landmark_save_dir):
        os.makedirs(landmark_save_dir)
    # load ground truth from annotation file
    # format of each line: image/path [x1,y1,x2,y2] for each gt_box in this image
    with open(anno_file, 'r') as f:
        annotations = f.readlines()
    image_size = 48
    net = "onet"
    im_idx_list = list()
    gt_boxes_list = list()
    gt_landmark_list = list()
    num_of_images = len(annotations)
    print "processing %d images in total" % num_of_images
    for annotation in annotations:
        annotation = annotation.strip().split(' ')
        im_idx = annotation[0]
        boxes = map(float, annotation[1:5])
        boxes = np.array(boxes, dtype=np.float32).reshape(-1, 4)
        landmarks = map(float, annotation[5:])
        landmarks = np.array(landmarks, dtype=np.float32).reshape(-1, 10)
        im_idx_list.append(im_idx)
        gt_boxes_list.append(boxes)
        gt_landmark_list.append(landmarks)
    save_path = config.ANNO_STORE_DIR
    if not os.path.exists(save_path):
        os.makedirs(save_path)
    f = open(os.path.join(save_path, 'landmark_48.txt'), 'w')
    det_handle = open(det_boxs_file, 'r')
    det_boxes = cPickle.load(det_handle)
    print len(det_boxes), num_of_images
    assert len(det_boxes) == num_of_images, "incorrect detections or ground truths"
    # index of neg, pos and part face, used as their image names
    p_idx = 0
    image_done = 0
    for im_idx, dets, gts, landmark in zip(im_idx_list, det_boxes, gt_boxes_list, gt_landmark_list):
        if image_done % 100 == 0:
            print "%d images done" % image_done
        image_done += 1
        if dets.shape[0] == 0:
            continue
        img = cv2.imread(os.path.join(prefix_path,im_idx))
        dets = convert_to_square(dets)
        dets[:, 0:4] = np.round(dets[:, 0:4])
        for box in dets:
            x_left, y_top, x_right, y_bottom = box[0:4].astype(int)
            width = x_right - x_left + 1
            height = y_bottom - y_top + 1
            # ignore box that is too small or beyond image border
            if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1:
                continue
            # compute intersection over union(IoU) between current box and all gt boxes
            Iou = IoU(box, gts)
            cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :]
            resized_im = cv2.resize(cropped_im, (image_size, image_size),
                                    interpolation=cv2.INTER_LINEAR)
            # save negative images and write label
            if np.max(Iou) < 0.3:
                # Iou with all gts must below 0.3
               continue
            else:
                # find gt_box with the highest iou
                idx = np.argmax(Iou)
                assigned_gt = gts[idx]
                x1, y1, x2, y2 = assigned_gt
                # compute bbox reg label
                offset_x1 = (x1 - x_left) / float(width)
                offset_y1 = (y1 - y_top) / float(height)
                offset_x2 = (x2 - x_right) / float(width)
                offset_y2 = (y2 - y_bottom) / float(height)
                offset_left_eye_x = (landmark[0,0] - x_left) / float(width)
                offset_left_eye_y = (landmark[0,1] - y_top) / float(height)
                offset_right_eye_x = (landmark[0,2] - x_left) / float(width)
                offset_right_eye_y = (landmark[0,3] - y_top) / float(height)
                offset_nose_x = (landmark[0,4] - x_left) / float(width)
                offset_nose_y = (landmark[0,5] - y_top) / float(height)
                offset_left_mouth_x = (landmark[0,6] - x_left) / float(width)
                offset_left_mouth_y = (landmark[0,7] - y_top) / float(height)
                offset_right_mouth_x = (landmark[0,8] - x_left) / float(width)
                offset_right_mouth_y = (landmark[0,9] - y_top) / float(height)
                # save positive and part-face images and write labels
                if np.max(Iou) >= 0.65:
                    save_file = os.path.join(landmark_save_dir, "%s.jpg" % p_idx)
                    f.write(save_file + ' -2 %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f %.2f \n' % \
                            (offset_x1, offset_y1, offset_x2, offset_y2, \
                             offset_left_eye_x, offset_left_eye_y, offset_right_eye_x, offset_right_eye_y,
                             offset_nose_x, offset_nose_y, offset_left_mouth_x, offset_left_mouth_y,
                             offset_right_mouth_x, offset_right_mouth_y))
                    cv2.imwrite(save_file, resized_im)
                    p_idx += 1
    f.close()
 def model_store_path():
    return os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))+"/model_store"
 def parse_args():
    parser = argparse.ArgumentParser(description='Test mtcnn',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--dataset_path', dest='dataset_path', help='dataset folder',
                        default='../data/wider/', type=str)
    parser.add_argument('--anno_file', dest='annotation_file', help='output data folder',
                        default='../data/wider/anno.txt', type=str)
    parser.add_argument('--pmodel_file', dest='pnet_model_file', help='PNet model file path',
                        default='/idata/workspace/mtcnn/model_store/pnet_epoch_5best.pt', type=str)
    parser.add_argument('--rmodel_file', dest='rnet_model_file', help='RNet model file path',
                        default='/idata/workspace/mtcnn/model_store/rnet_epoch_1.pt', type=str)
    parser.add_argument('--gpu', dest='use_cuda', help='with gpu',
                        default=config.USE_CUDA, type=bool)
    parser.add_argument('--prefix_path', dest='prefix_path', help='image prefix root path',
                        default='', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    gen_landmark48_data(args.dataset_path, args.annotation_file, args.pnet_model_file, args.rnet_model_file, args.prefix_path, args.use_cuda)
--- a/src/train_net/init.py
+++ b/src/train_net/init.py
--- a/src/train_net/train.py
+++ b/src/train_net/train.py
@@ -0,0 +1,281 @@
 from core.image_reader import TrainImageReader
 import datetime
 import os
 from core.models import PNet,RNet,ONet,LossFn
 import torch
 from torch.autograd import Variable
 import core.image_tools as image_tools
 def compute_accuracy(prob_cls, gt_cls):
    prob_cls = torch.squeeze(prob_cls)
    gt_cls = torch.squeeze(gt_cls)
    #we only need the detection which >= 0
    mask = torch.ge(gt_cls,0)
    #get valid element
    valid_gt_cls = torch.masked_select(gt_cls,mask)
    valid_prob_cls = torch.masked_select(prob_cls,mask)
    size = min(valid_gt_cls.size()[0], valid_prob_cls.size()[0])
    prob_ones = torch.ge(valid_prob_cls,0.6).float()
    right_ones = torch.eq(prob_ones,valid_gt_cls).float()
    return torch.div(torch.mul(torch.sum(right_ones),float(1.0)),float(size))
 def train_pnet(model_store_path, end_epoch,imdb,
              batch_size,frequent=50,base_lr=0.01,use_cuda=True):
    if not os.path.exists(model_store_path):
        os.makedirs(model_store_path)
    lossfn = LossFn()
    net = PNet(is_train=True, use_cuda=use_cuda)
    net.train()
    if use_cuda:
        net.cuda()
    optimizer = torch.optim.Adam(net.parameters(), lr=base_lr)
    train_data=TrainImageReader(imdb,12,batch_size,shuffle=True)
    for cur_epoch in range(1,end_epoch+1):
        train_data.reset()
        accuracy_list=[]
        cls_loss_list=[]
        bbox_loss_list=[]
        # landmark_loss_list=[]
        for batch_idx,(image,(gt_label,gt_bbox,gt_landmark))in enumerate(train_data):
            im_tensor = [ image_tools.convert_image_to_tensor(image[i,:,:,:]) for i in range(image.shape[0]) ]
            im_tensor = torch.stack(im_tensor)
            im_tensor = Variable(im_tensor)
            gt_label = Variable(torch.from_numpy(gt_label).float())
            gt_bbox = Variable(torch.from_numpy(gt_bbox).float())
            # gt_landmark = Variable(torch.from_numpy(gt_landmark).float())
            if use_cuda:
                im_tensor = im_tensor.cuda()
                gt_label = gt_label.cuda()
                gt_bbox = gt_bbox.cuda()
                # gt_landmark = gt_landmark.cuda()
            cls_pred, box_offset_pred = net(im_tensor)
            # all_loss, cls_loss, offset_loss = lossfn.loss(gt_label=label_y,gt_offset=bbox_y, pred_label=cls_pred, pred_offset=box_offset_pred)
            cls_loss = lossfn.cls_loss(gt_label,cls_pred)
            box_offset_loss = lossfn.box_loss(gt_label,gt_bbox,box_offset_pred)
            # landmark_loss = lossfn.landmark_loss(gt_label,gt_landmark,landmark_offset_pred)
            all_loss = cls_loss*1.0+box_offset_loss*0.5
            if batch_idx%frequent==0:
                accuracy=compute_accuracy(cls_pred,gt_label)
                show1 = accuracy.data.tolist()[0]
                show2 = cls_loss.data.tolist()[0]
                show3 = box_offset_loss.data.tolist()[0]
                show5 = all_loss.data.tolist()[0]
                print "%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(),cur_epoch,batch_idx, show1,show2,show3,show5,base_lr)
                accuracy_list.append(accuracy)
                cls_loss_list.append(cls_loss)
                bbox_loss_list.append(box_offset_loss)
            optimizer.zero_grad()
            all_loss.backward()
            optimizer.step()
        accuracy_avg = torch.mean(torch.cat(accuracy_list))
        cls_loss_avg = torch.mean(torch.cat(cls_loss_list))
        bbox_loss_avg = torch.mean(torch.cat(bbox_loss_list))
        # landmark_loss_avg = torch.mean(torch.cat(landmark_loss_list))
        show6 = accuracy_avg.data.tolist()[0]
        show7 = cls_loss_avg.data.tolist()[0]
        show8 = bbox_loss_avg.data.tolist()[0]
        print "Epoch: %d, accuracy: %s, cls loss: %s, bbox loss: %s" % (cur_epoch, show6, show7, show8)
        torch.save(net.state_dict(), os.path.join(model_store_path,"pnet_epoch_%d.pt" % cur_epoch))
        torch.save(net, os.path.join(model_store_path,"pnet_epoch_model_%d.pkl" % cur_epoch))
 def train_rnet(model_store_path, end_epoch,imdb,
              batch_size,frequent=50,base_lr=0.01,use_cuda=True):
    if not os.path.exists(model_store_path):
        os.makedirs(model_store_path)
    lossfn = LossFn()
    net = RNet(is_train=True, use_cuda=use_cuda)
    net.train()
    if use_cuda:
        net.cuda()
    optimizer = torch.optim.Adam(net.parameters(), lr=base_lr)
    train_data=TrainImageReader(imdb,24,batch_size,shuffle=True)
    for cur_epoch in range(1,end_epoch+1):
        train_data.reset()
        accuracy_list=[]
        cls_loss_list=[]
        bbox_loss_list=[]
        landmark_loss_list=[]
        for batch_idx,(image,(gt_label,gt_bbox,gt_landmark))in enumerate(train_data):
            im_tensor = [ image_tools.convert_image_to_tensor(image[i,:,:,:]) for i in range(image.shape[0]) ]
            im_tensor = torch.stack(im_tensor)
            im_tensor = Variable(im_tensor)
            gt_label = Variable(torch.from_numpy(gt_label).float())
            gt_bbox = Variable(torch.from_numpy(gt_bbox).float())
            gt_landmark = Variable(torch.from_numpy(gt_landmark).float())
            if use_cuda:
                im_tensor = im_tensor.cuda()
                gt_label = gt_label.cuda()
                gt_bbox = gt_bbox.cuda()
                gt_landmark = gt_landmark.cuda()
            cls_pred, box_offset_pred = net(im_tensor)
            # all_loss, cls_loss, offset_loss = lossfn.loss(gt_label=label_y,gt_offset=bbox_y, pred_label=cls_pred, pred_offset=box_offset_pred)
            cls_loss = lossfn.cls_loss(gt_label,cls_pred)
            box_offset_loss = lossfn.box_loss(gt_label,gt_bbox,box_offset_pred)
            # landmark_loss = lossfn.landmark_loss(gt_label,gt_landmark,landmark_offset_pred)
            all_loss = cls_loss*1.0+box_offset_loss*0.5
            if batch_idx%frequent==0:
                accuracy=compute_accuracy(cls_pred,gt_label)
                show1 = accuracy.data.tolist()[0]
                show2 = cls_loss.data.tolist()[0]
                show3 = box_offset_loss.data.tolist()[0]
                # show4 = landmark_loss.data.tolist()[0]
                show5 = all_loss.data.tolist()[0]
                print "%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(), cur_epoch, batch_idx, show1, show2, show3, show5, base_lr)
                accuracy_list.append(accuracy)
                cls_loss_list.append(cls_loss)
                bbox_loss_list.append(box_offset_loss)
                # landmark_loss_list.append(landmark_loss)
            optimizer.zero_grad()
            all_loss.backward()
            optimizer.step()
        accuracy_avg = torch.mean(torch.cat(accuracy_list))
        cls_loss_avg = torch.mean(torch.cat(cls_loss_list))
        bbox_loss_avg = torch.mean(torch.cat(bbox_loss_list))
        # landmark_loss_avg = torch.mean(torch.cat(landmark_loss_list))
        show6 = accuracy_avg.data.tolist()[0]
        show7 = cls_loss_avg.data.tolist()[0]
        show8 = bbox_loss_avg.data.tolist()[0]
        # show9 = landmark_loss_avg.data.tolist()[0]
        print "Epoch: %d, accuracy: %s, cls loss: %s, bbox loss: %s" % (cur_epoch, show6, show7, show8)
        torch.save(net.state_dict(), os.path.join(model_store_path,"rnet_epoch_%d.pt" % cur_epoch))
        torch.save(net, os.path.join(model_store_path,"rnet_epoch_model_%d.pkl" % cur_epoch))
 def train_onet(model_store_path, end_epoch,imdb,
              batch_size,frequent=50,base_lr=0.01,use_cuda=True):
    if not os.path.exists(model_store_path):
        os.makedirs(model_store_path)
    lossfn = LossFn()
    net = ONet(is_train=True)
    net.train()
    if use_cuda:
        net.cuda()
    optimizer = torch.optim.Adam(net.parameters(), lr=base_lr)
    train_data=TrainImageReader(imdb,48,batch_size,shuffle=True)
    for cur_epoch in range(1,end_epoch+1):
        train_data.reset()
        accuracy_list=[]
        cls_loss_list=[]
        bbox_loss_list=[]
        landmark_loss_list=[]
        for batch_idx,(image,(gt_label,gt_bbox,gt_landmark))in enumerate(train_data):
            im_tensor = [ image_tools.convert_image_to_tensor(image[i,:,:,:]) for i in range(image.shape[0]) ]
            im_tensor = torch.stack(im_tensor)
            im_tensor = Variable(im_tensor)
            gt_label = Variable(torch.from_numpy(gt_label).float())
            gt_bbox = Variable(torch.from_numpy(gt_bbox).float())
            gt_landmark = Variable(torch.from_numpy(gt_landmark).float())
            if use_cuda:
                im_tensor = im_tensor.cuda()
                gt_label = gt_label.cuda()
                gt_bbox = gt_bbox.cuda()
                gt_landmark = gt_landmark.cuda()
            cls_pred, box_offset_pred, landmark_offset_pred = net(im_tensor)
            # all_loss, cls_loss, offset_loss = lossfn.loss(gt_label=label_y,gt_offset=bbox_y, pred_label=cls_pred, pred_offset=box_offset_pred)
            cls_loss = lossfn.cls_loss(gt_label,cls_pred)
            box_offset_loss = lossfn.box_loss(gt_label,gt_bbox,box_offset_pred)
            landmark_loss = lossfn.landmark_loss(gt_label,gt_landmark,landmark_offset_pred)
            all_loss = cls_loss*0.8+box_offset_loss*0.6+landmark_loss*1.5
            if batch_idx%frequent==0:
                accuracy=compute_accuracy(cls_pred,gt_label)
                show1 = accuracy.data.tolist()[0]
                show2 = cls_loss.data.tolist()[0]
                show3 = box_offset_loss.data.tolist()[0]
                show4 = landmark_loss.data.tolist()[0]
                show5 = all_loss.data.tolist()[0]
                print "%s : Epoch: %d, Step: %d, accuracy: %s, det loss: %s, bbox loss: %s, landmark loss: %s, all_loss: %s, lr:%s "%(datetime.datetime.now(),cur_epoch,batch_idx, show1,show2,show3,show4,show5,base_lr)
                accuracy_list.append(accuracy)
                cls_loss_list.append(cls_loss)
                bbox_loss_list.append(box_offset_loss)
                landmark_loss_list.append(landmark_loss)
            optimizer.zero_grad()
            all_loss.backward()
            optimizer.step()
        accuracy_avg = torch.mean(torch.cat(accuracy_list))
        cls_loss_avg = torch.mean(torch.cat(cls_loss_list))
        bbox_loss_avg = torch.mean(torch.cat(bbox_loss_list))
        landmark_loss_avg = torch.mean(torch.cat(landmark_loss_list))
        show6 = accuracy_avg.data.tolist()[0]
        show7 = cls_loss_avg.data.tolist()[0]
        show8 = bbox_loss_avg.data.tolist()[0]
        show9 = landmark_loss_avg.data.tolist()[0]
        print "Epoch: %d, accuracy: %s, cls loss: %s, bbox loss: %s, landmark loss: %s " % (cur_epoch, show6, show7, show8, show9)
        torch.save(net.state_dict(), os.path.join(model_store_path,"onet_epoch_%d.pt" % cur_epoch))
        torch.save(net, os.path.join(model_store_path,"onet_epoch_model_%d.pkl" % cur_epoch))
--- a/src/train_net/train_o_net.py
+++ b/src/train_net/train_o_net.py
@@ -0,0 +1,50 @@
 import argparse
 import sys
 from core.imagedb import ImageDB
 import train as train
 import config
 import os
 def train_net(annotation_file, model_store_path,
                end_epoch=16, frequent=200, lr=0.01, batch_size=128, use_cuda=False):
    imagedb = ImageDB(annotation_file)
    gt_imdb = imagedb.load_imdb()
    gt_imdb = imagedb.append_flipped_images(gt_imdb)
    train.train_onet(model_store_path=model_store_path, end_epoch=end_epoch, imdb=gt_imdb, batch_size=batch_size, frequent=frequent, base_lr=lr, use_cuda=use_cuda)
 def parse_args():
    parser = argparse.ArgumentParser(description='Train ONet',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--anno_file', dest='annotation_file',
                        default=os.path.join(config.ANNO_STORE_DIR,config.ONET_TRAIN_IMGLIST_FILENAME), help='training data annotation file', type=str)
    parser.add_argument('--model_path', dest='model_store_path', help='training model store directory',
                        default=config.MODEL_STORE_DIR, type=str)
    parser.add_argument('--end_epoch', dest='end_epoch', help='end epoch of training',
                        default=config.END_EPOCH, type=int)
    parser.add_argument('--frequent', dest='frequent', help='frequency of logging',
                        default=200, type=int)
    parser.add_argument('--lr', dest='lr', help='learning rate',
                        default=0.002, type=float)
    parser.add_argument('--batch_size', dest='batch_size', help='train batch size',
                        default=1000, type=int)
    parser.add_argument('--gpu', dest='use_cuda', help='train with gpu',
                        default=config.USE_CUDA, type=bool)
    parser.add_argument('--prefix_path', dest='', help='training data annotation images prefix root path', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    print 'train ONet argument:'
    print args
    train_net(annotation_file=args.annotation_file, model_store_path=args.model_store_path,
                end_epoch=args.end_epoch, frequent=args.frequent, lr=args.lr, batch_size=args.batch_size, use_cuda=args.use_cuda)
--- a/src/train_net/train_p_net.py
+++ b/src/train_net/train_p_net.py
@@ -0,0 +1,49 @@
 import argparse
 import sys
 from core.imagedb import ImageDB
 from train import train_pnet
 import config
 import os
 def train_net(annotation_file, model_store_path,
                end_epoch=16, frequent=200, lr=0.01, batch_size=128, use_cuda=False):
    imagedb = ImageDB(annotation_file)
    gt_imdb = imagedb.load_imdb()
    gt_imdb = imagedb.append_flipped_images(gt_imdb)
    train_pnet(model_store_path=model_store_path, end_epoch=end_epoch, imdb=gt_imdb, batch_size=batch_size, frequent=frequent, base_lr=lr, use_cuda=use_cuda)
 def parse_args():
    parser = argparse.ArgumentParser(description='Train PNet',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--anno_file', dest='annotation_file',
                        default=os.path.join(config.ANNO_STORE_DIR,config.PNET_TRAIN_IMGLIST_FILENAME), help='training data annotation file', type=str)
    parser.add_argument('--model_path', dest='model_store_path', help='training model store directory',
                        default=config.MODEL_STORE_DIR, type=str)
    parser.add_argument('--end_epoch', dest='end_epoch', help='end epoch of training',
                        default=config.END_EPOCH, type=int)
    parser.add_argument('--frequent', dest='frequent', help='frequency of logging',
                        default=200, type=int)
    parser.add_argument('--lr', dest='lr', help='learning rate',
                        default=config.TRAIN_LR, type=float)
    parser.add_argument('--batch_size', dest='batch_size', help='train batch size',
                        default=config.TRAIN_BATCH_SIZE, type=int)
    parser.add_argument('--gpu', dest='use_cuda', help='train with gpu',
                        default=config.USE_CUDA, type=bool)
    parser.add_argument('--prefix_path', dest='', help='training data annotation images prefix root path', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    print 'train Pnet argument:'
    print args
    train_net(annotation_file=args.annotation_file, model_store_path=args.model_store_path,
                end_epoch=args.end_epoch, frequent=args.frequent, lr=args.lr, batch_size=args.batch_size, use_cuda=args.use_cuda)
--- a/src/train_net/train_r_net.py
+++ b/src/train_net/train_r_net.py
@@ -0,0 +1,50 @@
 import argparse
 import sys
 from core.imagedb import ImageDB
 import train as train
 import config
 import os
 def train_net(annotation_file, model_store_path,
                end_epoch=16, frequent=200, lr=0.01, batch_size=128, use_cuda=False):
    imagedb = ImageDB(annotation_file)
    gt_imdb = imagedb.load_imdb()
    gt_imdb = imagedb.append_flipped_images(gt_imdb)
    train.train_rnet(model_store_path=model_store_path, end_epoch=end_epoch, imdb=gt_imdb, batch_size=batch_size, frequent=frequent, base_lr=lr, use_cuda=use_cuda)
 def parse_args():
    parser = argparse.ArgumentParser(description='Train  RNet',
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--anno_file', dest='annotation_file',
                        default=os.path.join(config.ANNO_STORE_DIR,config.RNET_TRAIN_IMGLIST_FILENAME), help='training data annotation file', type=str)
    parser.add_argument('--model_path', dest='model_store_path', help='training model store directory',
                        default=config.MODEL_STORE_DIR, type=str)
    parser.add_argument('--end_epoch', dest='end_epoch', help='end epoch of training',
                        default=config.END_EPOCH, type=int)
    parser.add_argument('--frequent', dest='frequent', help='frequency of logging',
                        default=200, type=int)
    parser.add_argument('--lr', dest='lr', help='learning rate',
                        default=config.TRAIN_LR, type=float)
    parser.add_argument('--batch_size', dest='batch_size', help='train batch size',
                        default=config.TRAIN_BATCH_SIZE, type=int)
    parser.add_argument('--gpu', dest='use_cuda', help='train with gpu',
                        default=config.USE_CUDA, type=bool)
    parser.add_argument('--prefix_path', dest='', help='training data annotation images prefix root path', type=str)
    args = parser.parse_args()
    return args
 if __name__ == '__main__':
    args = parse_args()
    print 'train Rnet argument:'
    print args
    train_net(annotation_file=args.annotation_file, model_store_path=args.model_store_path,
                end_epoch=args.end_epoch, frequent=args.frequent, lr=args.lr, batch_size=args.batch_size, use_cuda=args.use_cuda)
--- a/test.jpg
+++ b/test.jpg
--- a/test_image.py
+++ b/test_image.py
@@ -0,0 +1,20 @@
 import cv2
 from core.detect import create_mtcnn_net, MtcnnDetector
 import core.vision as vision
 if __name__ == '__main__':
    pnet, rnet, onet = create_mtcnn_net(p_model_path="./model_store/pnet_epoch_5best.pt", r_model_path="./model_store/rnet_epoch_1.pt", o_model_path="./model_store/onet_epoch_7bbest.pt", use_cuda=True)
    mtcnn_detector = MtcnnDetector(pnet=pnet, rnet=rnet, onet=onet, min_face_size=24)
    img = cv2.imread("./test.jpg")
    b, g, r = cv2.split(img)
    img2 = cv2.merge([r, g, b])
    bboxs, landmarks = mtcnn_detector.detect_face(img)
    # print box_align
    vision.vis_face(img2,bboxs,landmarks)