We’ve trained a large-scale unsupervised language model which creates coherent paragraphs of text, achieves state-of-the-art performance on numerous language modeling benchmarks, and executes rudimentary reading comprehension, device interpretation, concern answering, and summarization—all without task-specific training.
Our model, called GPT-2 (a successor to GPT), ended up being trained only to anticipate the word that is next 40GB of Web text. Because of our issues about harmful applications regarding the technology, we have been perhaps perhaps maybe not releasing the trained model. As a test in responsible disclosure, our company is alternatively releasing a much smaller model for scientists to test out, along with a technical paper.
GPT-2 is a sizable transformer-based language model with 1.5 billion parameters, trained on a dataset 1 of 8 million website pages. GPT-2 is trained having a easy goal: anticipate the following term, offered most of the past words within some text. The diversity associated with the dataset causes this simple objective to include obviously occurring demonstrations of numerous tasks across diverse domain names. Read more