Advancing reasoning and planning in large language models via reward shaping