logSumExp : Exp a => Ord a => Neg a => Foldable (Tensor [i]) => AllAlgebra [i] a => Tensor [i] a -> Maybe a Numerically stable log-sum-exp operation
LSE(x) = max(x) + log(Σᵢ exp(xᵢ - max(x)))
See https://gregorygundersen.com/blog/2020/02/09/log-sum-exp/
Visibility: public exportlogSoftargmax : Exp a => Ord a => Neg a => Foldable (Tensor [i]) => AllAlgebra [i] a => Tensor [i] a -> Tensor [i] a Log(softargmax(x)), but computationally efficient and numerically stable
Used for computing cross-entropy loss
Returns empty tensor for empty input
Visibility: public exportsoftargmaxImpl : Fractional a => Exp a => Ord a => Neg a => Foldable (Tensor [i]) => AllAlgebra [i] a => {default 1 _ : a} -> Tensor [i] a -> Tensor [i] a Commonly known as 'softmax'
When `temperature=0` it reduces to `argmax`
Visibility: public exportsoftargmax : Fractional a => Exp a => Ord a => Neg a => Foldable (Tensor [i]) => AllAlgebra [i] a => Tensor [i] a -\-> Tensor [i] a Softargmax as a parametric map, with temperature as a parameter
TODO the output type should be a distribution tensor, since distributions
are applicative? https://glaive-research.org/2025/02/11/Generalized-Transformers-from-Applicative-Functors.html
Visibility: public export