Add --no-delay option.
(Read the discussion at HA thread from here
--no-delay will compensate encoder delay (2112 samples) by prepending silence of 960 samples before sending input to encoder, then trimming 3 AAC frames at beginning (2112 + 960 = 3072 = 1024 * 3, where 1024 is the frame length of AAC. So total amount of delay will be exactly equals to length of 3 AAC frames). Note that these numbers are doubled in case of SBR.
This option is meant for video as a mean to resolve A/V sync issue. The resultant AAC will have exactly zero-delay, but might have pops/clicks at the beginning. Use with care.