KVMNet

Implementation of a Transformer KVMNet (aka QKV cache, self attention)

Built a Transformer KVMNet for looking up biographical information on various public figures by utilizing PyTorch, numpy, GloVe word embeddings, and nltk. Implemented and trained using Google Colab free TPUs. Model architecture based on the paper by Miller et al., available here: https://arxiv.org/abs/1606.03126.