Skip to content

LocalDoc-Azerbaijan/AzTreeBank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

AzTreeBank

AzTreeBank is a syntactically annotated treebank for the Azerbaijani language, following the Universal Dependencies guidelines.

Data Sources

The data in AzTreeBank was collected from a variety of sources, including:

  • Books
  • Wikipedia
  • News websites (sports, politics, and other topics)
  • Scientific and literary articles

Data Generation

Annotations in AzTreeBank were generated automatically using the GPT-4o model, providing wide coverage of syntactic structures of the Azerbaijani language.

Authors

AzTreeBank was developed and maintained by the LocalDoc team.

License

This dataset is licensed under the Creative Commons NonCommercial 4.0 International License (CC BY-NC 4.0). You are free to share and adapt the material, provided it is not used for commercial purposes.

Language

The corpus is entirely in Azerbaijani.

Statistics

  • Sentences: 94,246

Annotation

The annotations include parts of speech (POS) tags, morphological features, and syntactic dependency relations following the Universal Dependencies schema.

Contact

For any inquiries or further information, please contact the LocalDoc team at [v.resad.89@gmail.com].

About

AzTreeBank is a syntactically annotated treebank for the Azerbaijani language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published