<metaname="twitter:description"content="PyTorch implementation and tutorial of Capsule Networks. Capsule networks is neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules."/>
<metaname="twitter:description"content="PyTorch implementation and tutorial of Capsule Networks. Capsule networks is neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules."/>
<metaname="twitter:description"content="A simple PyTorch implementation/tutorial of Cycle GAN introduced in paper Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks."/>
<metaname="twitter:description"content="A simple PyTorch implementation/tutorial of Cycle GAN introduced in paper Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks."/>
<metaname="twitter:description"content="A set of PyTorch implementations/tutorials of popular gradient descent based optimizers. Currently includes Adam, AMSGrad and RAdam optimizers."/>
<metaname="twitter:description"content="A set of PyTorch implementations/tutorials of popular gradient descent based optimizers. Currently includes Adam, AMSGrad and RAdam optimizers."/>
<metaname="twitter:title"content="Noam optimizer from Attention is All You Need paper"/>
<metaname="twitter:title"content="Noam optimizer from Attention is All You Need paper"/>
<metaname="twitter:description"content="This is a tutorial/implementation of Noam optimizer. Noam optimizer has a warm-up period and then an exponentially decaying learning rate."/>
<metaname="twitter:description"content="This is a tutorial/implementation of Noam optimizer. Noam optimizer has a warm-up period and then an exponentially decaying learning rate."/>
<metaname="twitter:description"content="This is a PyTorch implementation/tutorial of Deep Q Networks (DQN) from paper Playing Atari with Deep Reinforcement Learning. This includes dueling network architecture, a prioritized replay buffer and double-Q-network training."/>
<metaname="twitter:description"content="This is a PyTorch implementation/tutorial of Deep Q Networks (DQN) from paper Playing Atari with Deep Reinforcement Learning. This includes dueling network architecture, a prioritized replay buffer and double-Q-network training."/>
<metaname="twitter:description"content="This is a collection of PyTorch implementations/tutorials of reinforcement learning algorithms. It currently includes Proximal Policy Optimization, Generalized Advantage Estimation, and Deep Q Networks."/>
<metaname="twitter:description"content="This is a collection of PyTorch implementations/tutorials of reinforcement learning algorithms. It currently includes Proximal Policy Optimization, Generalized Advantage Estimation, and Deep Q Networks."/>
<metaname="twitter:description"content="This is an annotated PyTorch implementation of the Sketch RNN from paper A Neural Representation of Sketch Drawings. Sketch RNN is a sequence-to-sequence model that generates sketches of objects such as bicycles, cats, etc."/>
<metaname="twitter:description"content="This is an annotated PyTorch implementation of the Sketch RNN from paper A Neural Representation of Sketch Drawings. Sketch RNN is a sequence-to-sequence model that generates sketches of objects such as bicycles, cats, etc."/>
<metaname="twitter:title"content="Gated Linear Units and Variants"/>
<metaname="twitter:title"content="Gated Linear Units and Variants"/>
<metaname="twitter:description"content="Train an auto-regressive transformer with Gated Linear Units and variants for the position-wise feedforward network (FFN)."/>
<metaname="twitter:description"content="Train an auto-regressive transformer with Gated Linear Units and variants for the position-wise feedforward network (FFN)."/>
<metaname="description"content="Train an auto-regressive transformer with Gated Linear Units and variants for the position-wise feedforward network (FFN)."/>
<metaname="twitter:title"content="Gated Linear Units and Variants"/>
<metaname="twitter:description"content=""/>
<metaname="twitter:description"content="Train an auto-regressive transformer with Gated Linear Units and variants for the position-wise feedforward network (FFN)."/>
<metaproperty="og:title"content="Gated Linear Units and Variants"/>
<metaproperty="og:description"content=""/>
<metaproperty="og:description"content="Train an auto-regressive transformer with Gated Linear Units and variants for the position-wise feedforward network (FFN)."/>
<metaname="twitter:title"content="Gated Linear Units and Variants"/>
<metaname="twitter:title"content="Gated Linear Units and Variants"/>
<metaname="twitter:description"content="Train an auto-regressive transformer with Gated Linear Units and variants for the position-wise feedforward network (FFN)."/>
<metaname="twitter:description"content="Train an auto-regressive transformer with Gated Linear Units and variants for the position-wise feedforward network (FFN)."/>
<p>Here’s a notebook for training a GPT model on Tiny Shakespeare dataset.</p>
<p>Here’s a notebook for training a GPT model on Tiny Shakespeare dataset.</p>
<p><ahref="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/gpt/experiment.ipynb"><imgalt="Open In Colab"src="https://colab.research.google.com/assets/colab-badge.svg"/></a>
<p><ahref="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/gpt/experiment.ipynb"><imgalt="Open In Colab"src="https://colab.research.google.com/assets/colab-badge.svg"/></a>
<metaname="twitter:title"content="Evaluate k-nearest neighbor language model"/>
<metaname="twitter:title"content="Evaluate k-nearest neighbor language model"/>
<metaname="twitter:description"content="This runs the kNN model and merges the kNN results with transformer output to achieve better results than just using the transformer."/>
<metaname="twitter:description"content="This runs the kNN model and merges the kNN results with transformer output to achieve better results than just using the transformer."/>
<metaname="twitter:title"content="k-Nearest Neighbor Language Models"/>
<metaname="twitter:title"content="k-Nearest Neighbor Language Models"/>
<metaname="twitter:description"content="This is a simple PyTorch implementation/tutorial of the paper Generalization through Memorization: Nearest Neighbor Language Models using FAISS. It runs a kNN model on the final transformer layer embeddings to improve the loss of transformer based language models. It's also great for domain adaptation without pre-training."/>
<metaname="twitter:description"content="This is a simple PyTorch implementation/tutorial of the paper Generalization through Memorization: Nearest Neighbor Language Models using FAISS. It runs a kNN model on the final transformer layer embeddings to improve the loss of transformer based language models. It's also great for domain adaptation without pre-training."/>
<metaname="twitter:description"content="This is an implementation of label smoothing loss, that can be used as an alternative to cross entropy loss for improved accuracy."/>
<metaname="twitter:description"content="This is an implementation of label smoothing loss, that can be used as an alternative to cross entropy loss for improved accuracy."/>
<metaname="twitter:title"content="Transformer Encoder and Decoder Models"/>
<metaname="twitter:title"content="Transformer Encoder and Decoder Models"/>
<metaname="twitter:description"content="These are PyTorch implementations of Transformer based encoder and decoder models, as well as other related modules."/>
<metaname="twitter:description"content="These are PyTorch implementations of Transformer based encoder and decoder models, as well as other related modules."/>
<metaname="twitter:description"content="Implementation with explanation of fixed positional encodings as described in paper Attention is All You Need."/>
<metaname="twitter:description"content="Implementation with explanation of fixed positional encodings as described in paper Attention is All You Need."/>