corpus
sintaxă
limba română
dialoguri din media socială
limbă nestandard
româna contemporană
folclor
poezie
treebank
syntax
Romanian language
chat
social media
poetry
nonstandard Romanian
Accesul la această resursă este restricționat. Pentru a o descărca adresați-vă unui membru al echipei.
Autori: Cătălina Mărănduc, Augusto Perez
Un treebank balansat, documentele au în titlu CHAT (social media - 2 fișiere xml) - 2500 fraze, CONT (contemporan - 9 fișiere xml) - 8444 fraze, OLD (limbă veche - 33 fișiere) - 20000 fraze și POP (folclor - 5 fișiere xml) - 25000 fraze. Dimensiunea totală a resursei: 38.600 de fraze adnotate manual.
Exemplu 1. Corpus contemporan - 1984, George Orwell
<treebank id="CONT_1984_orwel">
<sentence id="8" parser="" user="augusto" date="2016-05-27">
<word id="1" form="Pe" lemma="pe" postag="Spsa" head="15" chunk="" deprel="c.c.l."/>
<word id="2" form="fiecare" lemma="fiecare" postag="Di3-sr" head="3" chunk="" deprel="a.adj."/>
<word id="3" form="palier" lemma="palier" postag="Ncmsrn" head="1" chunk="" deprel="prep."/>
<word id="4" form="," lemma="," postag="COMMA" head="5" chunk="" deprel="punct."/>
<word id="5" form="așezată" lemma="așeza" postag="Vmp--sf-p--r" head="15" chunk="" deprel="el.pred."/>
<word id="6" form="faţă în faţă" lemma="faţă_în_faţă" postag="Rg" head="5" chunk="" deprel="c.c.l."/>
<word id="7" form="cu" lemma="cu" postag="Spsa" head="6" chunk="" deprel="c.c.soc."/>
<word id="8" form="ușa" lemma="ușă" postag="Ncfsry" head="7" chunk="" deprel="prep."/>
<word id="9" form="liftului" lemma="lift" postag="Ncmsoy" head="8" chunk="" deprel="a.subst."/>
<word id="10" form="," lemma="," postag="COMMA" head="5" chunk="" deprel="punct."/>
<word id="11" form="figura" lemma="figură" postag="Ncfsry" head="15" chunk="" deprel="sbj."/>
<word id="12" form="cea" lemma="cel" postag="Tdfsr" head="13" chunk="" deprel="det."/>
<word id="13" form="enormă" lemma="enorm" postag="Afpfsrn" head="11" chunk="" deprel="a.adj."/>
<word id="14" form="îl" lemma="el" postag="Pp3msa--------w" head="15" chunk="" deprel="c.d."/>
<word id="15" form="privea" lemma="privi" postag="Vmii3s" head="0" chunk=""/>
<word id="16" form="fix" lemma="fix" postag="Rg" head="15" chunk="" deprel="c.c.m."/>
<word id="17" form="din" lemma="din" postag="Spca" head="15" chunk="" deprel="c.c.l."/>
<word id="18" form="perete" lemma="perete" postag="Ncmsrn" head="17" chunk="" deprel="prep."/>
<word id="19" form="." lemma="." postag="PERIOD" head="15" chunk="" deprel="punct."/>
</sentence>
....
</treebank>
Exemplu 2. Corpus de limbă veche, secolul XVI, Pravila lui Coresi, 1560
<treebank id="OLD_XVI_CORESI_Prav_1560">
...
<sentence id="2" parser="Victoria's parser" user="ugla" date="2020-27-23">
<word id="1" form="Nu" lemma="nu" postag="Qz" head="2" chunk="" deprel="neg."/>
<word id="2" form="priimeşti" lemma="priimeşti" postag="Vmip2s" head="0" chunk=""/>
<word id="3" form="Dumnezeu" lemma="Dumnezeu" postag="Npmsrn" head="2" chunk="" deprel="sbj."/>
<word id="4" form="," lemma="," postag="COMMA" head="5" chunk="" deprel="punct."/>
<word id="5" form="ce" lemma="ce" postag="Ccssp" head="2" chunk="" deprel="coord."/>
<word id="6" form="priimeaşte" lemma="primi" postag="Vmip3s" head="5" chunk="" deprel="coord."/>
<word id="7" form="Dumnezeul" lemma="Dumnezeu" postag="Npmsry" head="6" chunk="" deprel="sbj."/>
<word id="8" form="acela" lemma="acela" postag="Dd3msr---o" head="6" chunk="" deprel="c.d."/>
<word id="9" form="ce" lemma="ce" postag="Pw3--r" head="8" chunk="" deprel="a.vb."/>
<word id="10" form="se" lemma="sine" postag="Px3--a--------w" head="11" chunk="" deprel="refl."/>
<word id="11" form="roagă" lemma="ruga" postag="Vmip3s" head="9" chunk="" deprel="subord."/>
<word id="12" form="bine" lemma="bine" postag="Rg" head="11" chunk="" deprel="c.c.m."/>
<word id="13" form="." lemma="." postag="PERIOD" head="2" chunk="" deprel="punct."/>
</sentence>
....
</treebank>