Semantic image understanding: from pixel to word