Knowing that AUs are active on sparse facial regions, RL aims to identify these regions for a better specificity. On the other hand, a strong statistical evidence of AU correlations suggests that ML is a natural way to model the detection task. This paper proposes Deep Region and Multi-label Learning (DRML), a unified deep network that simultaneously addresses these two problems. One crucial aspect in DRML is a novel region layer that uses feed-forward functions to induce important facial regions, forcing the learned weights to capture structural information of the face. Our region layer serves as an alternative design between locally connected layers (ie, confined kernels to individual pixels) and conventional convolution layers (ie, shared kernels across an entire image). DRML by construction addresses both RL and ML, allowing the two seemingly irrelevant problems to interact directly. The complete network is end-to-end trainable, and automatically learns representations robust to variations inherent within a local region.