Categories
AI dataset

We are leading

As promised, more data from the blogs and education category is now in our granary! To get an idea of the task we’re facing, the data from this category alone is 2.9million files and that’s just a fraction of what we’ve collected. Another added set of data relates to job listings. As a result, at the moment our project has the largest number of Polish data!

Categories
AI dataset

Happy Easter!

We wish you much peace and joy in the coming days!

In the meanwhile, we can report the import of more data. As promised, another from the blogs and education category which, together with the previous texts, gives us more than 145 GB of text data. You can see more details on our dashboard: Speakleash Dashboard – Streamlit

Happy Easter!

Categories
AI dataset

141GB

Another 3 datasets are already in our granary! The datasets come from media in general as well as from sites related to weblogs. Currently our dataset count has stopped at 141GB, and you can be sure that there will be another increase from these areas, like media and blogs, in the near future.
Below you can see the distribution of each category on a pie chart.

Categories
AI dataset

We don’t stop

We have big plans and a amazing team, but the amount of data is too much for the existing staff to be able to achieve our ambitious goal within the deadline. 

Therefore, if you know Python language and love data, please write to us. We need your help right now!

Ending with positive news, another 6GB from the legal category is already in our SpeakLeash. For more details visit our dashboard( https://speakleash.streamlit.app/ )

Categories
AI dataset

Spring has come!

We welcome spring with another great news! Thanks to the acquisition of data from media categories and online stores, we managed to exceed 120GB of data! Big thank you for the whole team for hard work which is an inspiration for all of us.
How much do you think we will be able to collect this spring?

Categories
AI dataset

Another milestone

After months of research and talks we can say we made a milestone in our mission. We reached over 100GB of pure data text! It includes Wikipedia, thesis and novels. What do you think about it? What data would you like to add to train first polish GPT? Don’t hesitate to look it up here: https://speakleash.streamlit.app/.

Categories
AI dataset

BIG ANNOUNCEMENT!!

From now on, on our webpage extension (https://speakleash.streamlit.app/) you can see a live dashboard! Thanks to it you can track how our work is going starting from capacity of data, distribution of the data between the industries and much more! Apart from it, you can apply filters which help fit your demands. If you have any questions about the dashboard or SpeakLeash in general don’t hesitate to ask them.

Categories
AI dataset

Big announcement!

From now on on our webpage extension (https://speakleash.streamlit.app/) you can see a live dashboard! Thanks to it you can track how our work is going starting from capacity of data, distribution of the data between the industries and much more! Apart from it, you can apply filters which help fit your demands. If you have any questions about the dashboard or SpeakLeash in general don’t hesitate to ask them.

Categories
Conference

PyTech Summit 2022

If you want to learn more about SpeakLeash make sure to join the PyTech Summit 2022 (online) Winter edition conference (08.12.2022), where one of our founders – Sebastian Kondracki will have the opportunity to talk about us. You can get your free tickets here.

Categories
AI dataset

Social & GitHub are live!

We are happy to announce that our social platforms & GitHub are live! You can find the links in Community & Contact section. If you want to be updated about our progress, make sure to leave a follow.