AlphaGo is local search guided by a learned heuristic, which is trained in a simulator of the game. The heuristic learns an approximation of the value of moves and board states, and is analogous to the "compute_cost_of_tour()" routine in TSP algorithms.
In the basic TSP (for example) there is no other data to learn from than "distances" between vertices, and anything learned from a single instance of the problem amounts to overfitting. This might still be useful - for example learning efficient sub-paths on a fixed map, rather than searching for them every time.
Self-organized maps can be used as a neural approach to find TSP solutions; in these cases the network itself is the optimized solution. Think of it as ~gradient-descent~ optimization for TSP. Not sure if it is relevant in benchmarks. (I think it might amount to minimizing the sum squared distance between hops (or a bound on that), not the total length of tour. It favours many shorter hops over a few long hops.)
(If you want time-window constraints in LKH, IIRC, you can try adding the time-diff as penalties to your global cost function.)
LKH does support a lot of things mentioned, but for practical usages it would not work. It's nice to leave it running and see what can be accomplished but asking it to give you back something in 1 second, with a lot of constraints, gives back solutions that are not feasible.
In the basic TSP there is a lot of data.
For example, the reason why minimum spanning tree works is because the algorithm makes use of the relationship between vertices. Similar techniques use alpha-nearness, Steiner trees and direct modifications of distance matrix to create relaxations of the TSP and improve the performance of local search (I believe most are implemented in LKH).
I am obviously not expecting NNs to be capable of doing something like that currently but I'm hoping they might be able to discover interesting instance patterns for something more constrained.
> But these do not stay they same between problem instances; anything you learn from solving one problem is not helpful when solving the next problem.
But nothing in ML stays the same between instances. The reason why ML works is because there are redundancies in the training set. I am pretty sure that distribution wise, set of TSP instances still has a lot of redundancies.
You would want your model to learn to execute something like MST or to approximate alpha-nearness or to remap the instance into a relaxation that when solved by a simpler algorithm results in a solution that, when remapped back to original, is feasible and optimal.
In the basic TSP (for example) there is no other data to learn from than "distances" between vertices, and anything learned from a single instance of the problem amounts to overfitting. This might still be useful - for example learning efficient sub-paths on a fixed map, rather than searching for them every time.
Self-organized maps can be used as a neural approach to find TSP solutions; in these cases the network itself is the optimized solution. Think of it as ~gradient-descent~ optimization for TSP. Not sure if it is relevant in benchmarks. (I think it might amount to minimizing the sum squared distance between hops (or a bound on that), not the total length of tour. It favours many shorter hops over a few long hops.)
(If you want time-window constraints in LKH, IIRC, you can try adding the time-diff as penalties to your global cost function.)