This means that SARSA takes into account the control policy by which the agent is moving, and incorporates that into its update of action values, where Q-learning simply assumes that an optimal policy is being followed. read more
The difference between Q-learning and SARSA is that Q-learning makes an assumption about the control policy being used, and SARSA actually takes into account the behaviour of the control policy when updating q-values. read more
Well, not actually. A key difference between SARSA and Q-learning is that SARSA is an on-policy algorithm (it follows the policy that is learning) and Q-learning is an off-policy algorithm (it can follow any policy (that fulfills some convergence requirements). read more
When both SARSA and Q-learning use -greedy policy to strike the balance between exploration and exploitation, they still have different estimations on . Q-learning usually has more aggressive estimations, while SARSA usually has more conservative estimations. read more