-
The development team is on vacation from June 29th to July 26th. Service maintenance and possible bug fixes will also be on summer break. General advice on the services of the Digital and Population Data Agency: organisaatiopalvelut@dvv.fi.
Have a nice summer!
Jyväskylän yliopiston opinnäytetöitä
Data matrix for training XMTC machine learning models (TF-IDF) with Voikko lemmatisation (fi)
Data matrix for training XMTC machine learning models (TF-IDF) with Voikko lemmatisation. Based on Finnish corpus. Textual data follows the Bag-of-Words feature file format of The Extreme Classification Repositoryn (http://manikvarma.org/downloads/XC/XMLRepository.html).
The first line is formatted as:
total_documents number_of_features number_of_labels
All other lines represent one document per line:
label1,label2,...,labelk ft1:ft1_val ft2:ft2_val ft3:ft3_val .. ftd:ftd_val
i.e, comma-separated list of labels followed by all non-zero components of the TF-IDF vector given as component_number:value.
Preview
There are no views created for this data resource yet.
Additional information
| Format | TXT |
|---|---|
| File size | 68923498 |
| Data status | Current version |
| Temporal Coverage | 01.01.2010 - 31.12.2017 |
| Data last updated | 24 February 2021 |
| Metadata last updated | 24 February 2021 |
| Created | 24 February 2021 |
| SHA256 | 8086b21254abc2278a008bb39a00b07e5db992d7e4b045dc473799e75495d992 |