My personal list of journals I use for my research and projects where I wrote one-sentence summaries.

## Machine Learning Research Conferences and Journals

- ICLR
- IJCAI
- JAIR
- NIPS
- Journal of Machine Learning Research
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- Artificial Intelligence
- Machine Learning

### Deep Reinforcement Learning

- Playing Atari with Deep Reinforcement Learning
- Continuous Control with Deep Reinforcement Learning
- Deterministic Policy Gradient Algorithms
- Actor-Critic Methods
- Summary: an actor neural network would determine the actions (student) while the critic neural network would evaluate the actor’s actions (teacher)

- Progressive Neural Network, Reinforcment Learning Context
- Summary: adding columns for each new task results in better transfer learning compared to partial or complete fine-tuning which causes catastrophic forgetting

### Deep Convolutional Neural Networks

- Wide Residual Networks
- Summary: a variation of residual networks where width over depth has shown better performance

- SqueezeNet
- Summary: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

### Deep Neural Networks

- A shared neural ensemble links distinct contextual memories encoded close in time
- Summary: spatial memories that are acquired near in time are associated with overlapping neuronal ensembles in the brain’s hippocampus

- Memories linked within a window of time
- Summary: a theory called temporal context memory (TCM) explains why people have a better memory for words that occur close together in a list than for words that are further apart

- Learning Step Size Controllers for Robust Neural Network Training
- Summary: identifying informative states, using the states for learning step size and showing generalization to different tasks

- Weight Features for Predicting Future Model Performance of Deep Neural Networks
- Summary: using statistics of weights instead of actual weights

- Compete to Compute
- Summary: using competing linear units to outperform non-competing nonlinear units and avoid catastrophic forgetting when training sets change over time

- HyperNetworks
- Summary: using a HyperLSTMCell over BasicLSTM cell by using a small number of parameters (small LSTM) to generate a large number of parameters (larger LSTM)

- Non-Local Interaction via Diffusible Resource Prevents Coexistence of Cooperators and Cheaters in a Lattice Model
- Decoupled Neural Interfaces using Synthetic Gradients
- Summary: by modelling error gradients (synthetic gradients), we can decouple subgraphs and update them independently and asynchronously

- Distilling the Knowledge in a Neural Network
- Summary: using soft targets instead of hard targets, we can achieve similar performance from a much smaller network than a large network where we learned the soft targets from

### Hyper-parameter Optimization

- Learning to learn by gradient descent by gradient descent
- Summary: learning an optimization algorithm that works on a class of optimization problems by parameterizing the optimizer

- Direct Feedback Alignment Provides Learning in Deep Neural Networks
- Summary: an alternative to error backpropagation by propagating the error through fixed random feedback connections directly from the output layer to each hidden layer

- DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks
- Summary: using a convex combination of the starting and ending points to accelerate convergence

- Gradient-based Hyperparameter Optimization through Reversible Learning
- Summary: tuning hyperparameters by casting them as a learning problem

### Deep Recurrent Neural Networks

- HyperNetworks
- Summary: using a small LSTM to generate a large LSTM for substantial model compression

- Exploring Sparsity in RNN
- Summary: model size can be reduced by 90% and speed-up is around 2× to 7× while maintain accuracy by pruning weights during the initial training of the network