Improving Efficiency for Object Detection and Temporal Modeling for Action Localization