2022
Visual Studio Code Remote Tunnel 部署
2022-12-14
家庭组网方案
2022-09-06
LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
2022-04-06
COSFORMER: RETHINKING SOFTMAX IN ATTENTION
2022-03-30
移动端开发基础认识(Android & Flutter)
2022-03-25
Transformer Quality in Linear Time
2022-03-16
DeepNet: Scaling Transformers to 1,000 Layers
2022-03-10
2021
Predicting Attention Sparsity in Transformers
2021-10-04
R-Drop: Regularized Dropout for Neural Networks
2021-08-01
ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding
2021-07-19
ERNIE 3.0: LARGE-SCALEKNOWLEDGEENHANCEDPRE-TRAINING FORLANGUAGEUNDERSTANDING ANDGENERATION
2021-07-19
Parameter-Efficient Transfer Learning for NLP
2021-07-10
ERNIE: Enhanced Representation through Knowledge Integration
2021-07-10
How Can We Know What Language Models Know?
2021-07-10
FNet: Mixing Tokens with Fourier Transforms
2021-06-30
On Layer Normalization in the Transformer Architecture
2021-03-23
Understanding the Difficulty of Training Transformers
2021-03-22
2018
Hello Gridea
2018-12-12