Good plan. This is more or less what I would also build, if wanted to run a quick hack and see if it’s able to find some alpha.
There was some work with LSTMs to predict “a more fancy market momentum” that was pretty similar. I do like your multitask learning approach. Pretty key to include a side-task of predicting the next tick, even if that’s not the core task you care about.
Biggest downside I’d see is for something this noisy, model might just not learn anything. Unlike for text, or say predicting sports results… you will get some learning no matter what, so the model will start to pick up on structure.
For that structure, you might want to add tasks predicting correlation between stocks. Obviously a bigger output vector, but you could train with a mask, so not all correlations get gradient feedback, every time.
I think there’s a reason that this work kind of work was first (publicly) done for market momentum. If you can’t at least predict that…