Q: What is Automatic Differentiation?

A: Not rigiorously, consider a computer program as a function mapping between states, then this function in mathematical sense can have derivative. The motiviation is among the fields of Machine Learning and Numeral Analysis.

Q: How to compute AD?

A: There are too many ways. A quick intro to find out one or two ways is in this post

Q: What is the complexity difference between Forward Mode and Reverse Mode?

A: The time complexity difference will become obvious only during computation of tensor. It basically become the question of “in which order to compute a consecutive multiplication of a sequence of matrices is the cheapest”.

Q: What are all the possible ways of AD?

A: This post says classification includes Forward-Mode, Reverse-Mode, Source Transformation, Operator Overloading.

Q: Is operator overloading just dual number?

A: Seems like it according to this wiki. This method seems very easy to implememnt with forward-mode.

Q: I can deal with dual number during source transformation, just like how easily Haskell can define a dual number here. Then if operator overloading is just dual number, how is a ‘tape’ needed? And how is that not source transformation?

A: ????

Q: How is operator overloading with reverse-mode implemented?

A: ????