Q: What is Automatic Differentiation?
A: Not rigiorously, consider a computer program as a function mapping between states, then this function in mathematical sense can have derivative. The motiviation is among the fields of Machine Learning and Numeral Analysis.
Q: How to compute AD?
A: There are too many ways. A quick intro to find out one or two ways is in this post
Q: What is the complexity difference between Forward Mode and Reverse Mode?
A: The time complexity difference will become obvious only during computation of tensor. It basically become the question of “in which order to compute a consecutive multiplication of a sequence of matrices is the cheapest”.
Q: What are all the possible ways of AD?
A: This post says classification includes Forward-Mode, Reverse-Mode, Source Transformation, Operator Overloading.
Q: Is operator overloading just dual number?
A: Seems like it according to this wiki. This method seems very easy to implememnt with forward-mode.
Q: I can deal with dual number during source transformation, just like how easily Haskell can define a dual number here. Then if operator overloading is just dual number, how is a ‘tape’ needed? And how is that not source transformation?
A: ????
Q: How is operator overloading with reverse-mode implemented?
A: ????