Unsupervised discovery of commonalities in images has recently attracted much interest due to the need to find correspondences in large amounts of visual data. A natural extension, and a relatively unexplored problem, is how to discover common semantic temporal patterns in videos. That is, given two or more videos, find the subsequences that contain similar visual content in an unsupervised manner. We call this problem Temporal Commonality Discovery (TCD). This paper proposes an efficient branch and bound (B&B) algorithm to tackle the TCD problem. We derive tight bounds for classical distances between temporal bag of words of two segments, including L1, intersection and X2. Using these bounds the B&B algorithm can efficiently find the global optimal solution. Our algorithm is general, and can be applied to any feature that has been quantified into histograms. To the best of our knowledge, this is the first work that addresses unsupervised discovery of common events in videos.