SpeakLeash a.k.a Spichlerz!

An open collaboration project to build a data set for Language Modeling with a capacity of at least 1TB comprised of diverse texts in Polish. Our aim is to enable machine learning research and to train a Generative
Pre-trained Transformer Model from collected data.

Learn More -> About us

Latest news:

SUMMARY OF THE YEAR 20235 January 2024
The Speakleash dataset of texts in the Polish language has expanded by over 470 GB in 3.5 months!17 December 2023
Pytech Summit 20234 December 2023
New podcast’s episode!20 June 2023
We are partners!14 June 2023

For more news check out our twitter:

Tweets by Speak_Leash