Dark-Existed's Blog

温故而知新

首页 归档 标签 关于

2022

Visual Studio Code Remote Tunnel 部署

2022-12-14

家庭组网方案

2022-09-06

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

2022-04-06

COSFORMER: RETHINKING SOFTMAX IN ATTENTION

2022-03-30

移动端开发基础认识(Android & Flutter)

2022-03-25

Transformer Quality in Linear Time

2022-03-16

DeepNet: Scaling Transformers to 1,000 Layers

2022-03-10

2021

Predicting Attention Sparsity in Transformers

2021-10-04

R-Drop: Regularized Dropout for Neural Networks

2021-08-01

ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding

2021-07-19

ERNIE 3.0: LARGE-SCALEKNOWLEDGEENHANCEDPRE-TRAINING FORLANGUAGEUNDERSTANDING ANDGENERATION

2021-07-19

Parameter-Efficient Transfer Learning for NLP

2021-07-10

ERNIE: Enhanced Representation through Knowledge Integration

2021-07-10

How Can We Know What Language Models Know?

2021-07-10

FNet: Mixing Tokens with Fourier Transforms

2021-06-30

On Layer Normalization in the Transformer Architecture

2021-03-23

Understanding the Difficulty of Training Transformers

2021-03-22

2018

Hello Gridea

2018-12-12
Powered by Gridea RSS