Towards rigorous evaluation of binary testing and analysis