Posts by Collection
Wisdom emerged over years of system design and implementationsRead more
Developed from the experience of Unix philosophy, not as vague generalities but leads to specific prescriptions.Read more
Strategies to improve the performance or efficiency of the systemRead more
Patented in US Patent Office with patent number 9892150 and application number 14816805 on 2018
A database architecture includes at least an in-memory database and a disk-based database (also referred to as “hot” and “warm” data stores). In the database architecture, data can be partitioned (and re-partitioned) and/or moved within and among the in-memory and disk-based databases, based on query access patterns derived from received database queries. The partitions and inter-database movements can be based at least in part on clustered, dynamic data units that are defined using shared individual attribute values of data records, and updated based on the received queries. Read more
Recommended citation: Jayanth, Jayanth, Dastagiri Reddy, and Reghu Ram Thanumalayan. "Unified data management for database systems." U.S. Patent 9,892,150, issued February 13, 2018.
Published in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing on 2015
Opinion summarization is the task of producing the summary of a text, such that the summary also preserves the sentiment of the text. Opinion Summarization is thus a trade-off between summarization and sentiment analysis. The demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. We harness the power of submodularity to strike a balance between two conflicting requirements. We investigate an incipient class of submodular functions for the problem, and a partial enumeration based greedy algorithm that has performance guarantee of 63%. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score, which outperforms thestate-of-the-art algorithms. Read more
Recommended citation: Jayanth, Jayanth, Jayaprakash Sundararaj, and Pushpak Bhattacharyya. "Monotone submodularity in opinion summaries." In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 169-178. 2015.
Published in Arxiv Preprint 1609.07823 on 2016
In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual bottlenecks associated with disk-based databases. For scenarios, where the data volume grows into terabytes and petabytes, keeping all the data in memory is exorbitantly expensive. Hence, the data is compressed efficiently using different algorithms to exploit the multi-core parallelization technologies for query processing. Several compression methods are studied for compressing the column array, post Dictionary Encoding. In this paper, we will present two novel optimizations in compression techniques - Block Size Optimized Cluster Encoding and Block Size Optimized Indirect Encoding - which perform better than their predecessors. In the end, we also propose heuristics to choose the best encoding amongst common compression schemes. Read more
Recommended citation: Jayanth, Jayanth. "Optimizations and Heuristics to improve Compression in Columnar Database Systems." arXiv preprint arXiv:1609.07823 (2016).
Published in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) on 2018
Assyriology, the discipline that studies cuneiform sources and their context, has enormous potential for the application of computational linguistics theory and method on account of the significant quantity of transcribed texts that are available in digital form but that remain as yet largely unexploited. As part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/), we aim to bring together corpus data, lexical data, linguistic annotations and object metadata in order to contribute to resolving data processing and integration challenges in the field of Assyriology as a whole, as well as for related fields of research such as linguistics and history. Data sparsity presents a challenge to our goal of the automated transliteration of the administrative texts of the Ur III period. To mitigate this situation we have undertaken to annotate the whole corpus. To this end we have developed an annotation pipeline to facilitate the annotation of our gold corpus. This toolset can be re-employed to annotate any Sumerian text and will be integrated into the Cuneiform Digital Library Initiative (https://cdli.ucla.edu) infrastructure. To share these new data, we have also mapped our data to existing LOD and LLOD ontologies and vocabularies. This article provides details on the processing of Sumerian linguistic data using our pipeline, from raw transliterations to rich and structured data in the form of (L)LOD. We describe the morphological and syntactic annotation, with a particular focus on the publication of our datasets as LOD. This application of LLOD in Assyriology is unique and involves the concept of a LLOD edition of a linguistically annotated corpus of Sumerian, as well as linking with lexical resources, repositories of annotation terminology, and finally the museum collections in which the artifacts bearing these inscribed texts are kept. Read more
Recommended citation: Chiarcos, C., Ilya Khait, Émilie Pagé-Perron, Niko Schenk, Jayanth and Lucas Reckling. "Annotating Sumerian: A LLOD-enhanced Workflow for Cuneiform Corpora." (2018).
Published in Multidisciplinary Digital Publishing Institute on 2018
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian. Read more
Recommended citation: Chiarcos, Christian, Ilya Khait, Émilie Pagé-Perron, Niko Schenk, Jayanth, Christian Fäth, Julius Steuer, William Mcgrath, and Jinyan Wang. "Annotating a low-resource language with LLOD technology: Sumerian morphology and syntax." Information 9, no. 11 (2018): 290.
Published in University of California, Los Angeles on 2019
Many current challenges in natural language processing and computer vision have to deal with multiple objectives simultaneously. In this article, we study different methods to solve such multi-objective problem for CIFAR-100 and SEMEVAL datasets, and compare with traditional deep learning methods. The multi-output method achieves better results than training a single neural net from scratch with its own model for each objective. Multi-objective deep learning with weights achieves comparable results too. Read more
Recommended citation: Jayanth, Jayanth. Multiobjective Deep Learning. University of California, Los Angeles, 2019.