An open collaboration project to build a data set for Language Modeling with a capacity of at least 1TB comprised of diverse texts in Polish. Our aim is to enable machine learning research and to train a Generative
Pre-trained Transformer Model from collected data.
Latest news:
For more news check out our twitter: