27 lines
1.3 KiB
Plaintext
27 lines
1.3 KiB
Plaintext
Things to clarify:
|
|
|
|
about the network:
|
|
==================
|
|
remove the .to() calls inside the Module and use the self.on_cpu instead
|
|
process datasets and leave it as a generic parameter
|
|
padding could start at any random point between [0, length_i-pad_length]
|
|
- in training, pad to the shortest
|
|
- in test, pad to the largest
|
|
|
|
|
|
about the loss and the KTA:
|
|
===========================
|
|
not clear whether we should define the loss as in "On kernel target alignment", i.e., a numerator with <K,Y>f (and
|
|
change sign to minimize) or as |K-Y|f norm. What about the denominator (now, the normalization factor is n**2)?
|
|
maybe the sav-loss is something which may have sense to impose, as a regularization, across many last layers, and not
|
|
only the last one?
|
|
|
|
are the contribution of the two losses comparable? or one contributes far more than the other?
|
|
is the TwoClassBatch the best way?
|
|
maybe I have to review the validation of the sav-loss; since it is batched, it might be always checking the same
|
|
submatrices of for alignment, and those may be mostly positive or mostly near an identity?
|
|
SAV: how should the range of k(xi,xj) be interpreted? how to decide for value threshold for returning -1 or +1?
|
|
I guess the best thing to do is to learn a simple threshold, one feed forward 1-to-1
|
|
|
|
|