What does this tool do?

  • Identify sentences in uploaded documents that use the indirect voice.
  • Currently this tool supports Spanish and Latin.

How does the tool do this?

  • Spanish:
    • Looks for variations of decir with indirect pronouns to classify as indirect.
    • Default list of variations of decir with indirect pronouns (can override below): le dixo que, les dixo que, te dixo que, me dixo que, se dixo que, le dixo q, les dixo q, te dixo q, me dixo q, se dixo q, le dijo que, les dijo que, te dijo que, me dijo que, se dijo que, le dijo q, les dijo q, te dijo q, me dijo q, se dijo q, le dicho que, les dicho que, te dicho que, me dicho que, se dicho que, le dicho q, les dicho q, te dicho q, me dicho q, se dicho q, le dexir que, les dexir que, te dexir que, me dexir que, se dexir que, le dexir q, les dexir q, te dexir q, me dexir q, se dexir q, dixol qe, dixoli qe, li dizen qe, dizenli qe, le diz ca, dizeles que, dizele que, l dixo que, diz l que, dixol que, dixole que, dixoli que, quel dieres, le diz que, dizle que, li diz que, ldiz que, dixol’ que
  • Latin
    • The common indirect voice in Latin is formed by using a primary headverb conjugated normally, followed by an accusative noun, which is then followed by an infinitive verb.
    • The two common ways the indirect voice shows up is: infinitive + accusative + headverb (IAH) OR headverb + accusative + infinitive (HAI).
    • Default Headverbs (can override below): cognosco, conoscere, intellego, intellegere, nescio, nescire, scio, scire, arbitror, arbitrari, existimo, existimare, puto, putare, confiteor, confiteri, demonstro, demonstrare, dico, dicere, fateor, fateri, iuro, iurare, narro, narrare, nego, negare, nuntio, nuntiare, polliceor, polliceri, probo, probare, promitto, promittere, refero, referre, spero, sperare, audio, audire, comperio, comperere, sentio, sentire, video, videre
    • Accusative Noun Endings: um, os, em, es, am, as, us, a
    • The lamonpy library is used to lemmatize Latin verbs to their root unconjugated form and to identify parts of speech.
      • Lamon (LAtin MOrphological tools, pronounced /leɪmən/) is a simple POS tagger & lemmatizer for Latin written in C++ and Lamonpy is a Python package of Lamon. You can easily obtain lemma and tag of each word in given text using Lamonpy.

Requirements

  • Documents can be in .txt, .docx, and .pdf format.
  • PDF files must already recognize text. This Notebook does not provide OCR capabilities.
  • Files must be stored in a top-level directory in your Google Drive account. You will need to give permissions to access your Google Drive and then you will select the top-level folder containing the documents.
Identify Indirect Voice Sentences – Spanish, Latin

Leave a Reply

Your email address will not be published. Required fields are marked *