kernel_authorship/TODO.txt

Things to clarify:

about the network:
==================
remove the .to() calls inside the Module and use the self.on_cpu instead
process datasets and leave it as a generic parameter
padding could start at any random point between [0, length_i-pad_length]
    - in training, pad to the shortest
    - in test, pad to the largest


about the loss and the KTA:
===========================
not clear whether we should define the loss as in "On kernel target alignment", i.e., a numerator with <K,Y>f (and
    change sign to minimize) or as |K-Y|f norm. What about the denominator (now, the normalization factor is n**2)?
maybe the sav-loss is something which may have sense to impose, as a regularization, across many last layers, and not
    only the last one?

are the contribution of the two losses comparable? or one contributes far more than the other?
is the TwoClassBatch the best way?
maybe I have to review the validation of the sav-loss; since it is batched, it might be always checking the same
    submatrices of for alignment, and those may be mostly positive or mostly near an identity?
SAV: how should the range of k(xi,xj) be interpreted? how to decide for value threshold for returning -1 or +1?
    I guess the best thing to do is to learn a simple threshold, one feed forward 1-to-1