Posts by Collection

manifesto

medium_posts

patents

Unified data management for database systems

Patented in US Patent Office with patent number 9892150 and application number 14816805 on 2018

A database architecture includes at least an in-memory database and a disk-based database (also referred to as “hot” and “warm” data stores). In the database architecture, data can be partitioned (and re-partitioned) and/or moved within and among the in-memory and disk-based databases, based on query access patterns derived from received database queries. The partitions and inter-database movements can be based at least in part on clustered, dynamic data units that are defined using shared individual attribute values of data records, and updated based on the received queries. Read more

Recommended citation: Jayanth, Jayanth, Dastagiri Reddy, and Reghu Ram Thanumalayan. "Unified data management for database systems." U.S. Patent 9,892,150, issued February 13, 2018.
https://patentimages.storage.googleapis.com/ef/08/44/a29ac8da475627/US9892150.pdf

Rapid Heuristics Engine - A method and apparatus for rapidly defining and applying statistical heuristics for filtering network traffic

Patented in US Patent Office with patent number TBD and application number 18/622,698 on 2024

The Rapid Heuristics Engine provides a no-code interface for defining and applying statistical heuristics to filter invalid network traffic, significantly reducing the time and effort required for deployment and optimization. This system allows data scientists to independently manage heuristic rules, enhancing efficiency and ensuring faster mitigation of invalid traffic. Read more

Recommended citation:

publications

Monotone submodularity in opinion summaries

Published in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing on 2015

Opinion summarization is the task of producing the summary of a text, such that the summary also preserves the sentiment of the text. Opinion Summarization is thus a trade-off between summarization and sentiment analysis. The demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. We harness the power of submodularity to strike a balance between two conflicting requirements. We investigate an incipient class of submodular functions for the problem, and a partial enumeration based greedy algorithm that has performance guarantee of 63%. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score, which outperforms thestate-of-the-art algorithms. Read more

Recommended citation: Jayanth, Jayanth, Jayaprakash Sundararaj, and Pushpak Bhattacharyya. "Monotone submodularity in opinion summaries." In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 169-178. 2015.
https://aclanthology.org/D15-1017.pdf

Optimizations and Heuristics to improve Compression in Columnar Database Systems

Published in Arxiv Preprint 1609.07823 on 2016

In-memory columnar databases have become mainstream over the last decade and have vastly improved the fast processing of large volumes of data through multi-core parallelism and in-memory compression thereby eliminating the usual bottlenecks associated with disk-based databases. For scenarios, where the data volume grows into terabytes and petabytes, keeping all the data in memory is exorbitantly expensive. Hence, the data is compressed efficiently using different algorithms to exploit the multi-core parallelization technologies for query processing. Several compression methods are studied for compressing the column array, post Dictionary Encoding. In this paper, we will present two novel optimizations in compression techniques - Block Size Optimized Cluster Encoding and Block Size Optimized Indirect Encoding - which perform better than their predecessors. In the end, we also propose heuristics to choose the best encoding amongst common compression schemes. Read more

Recommended citation: Jayanth, Jayanth. "Optimizations and Heuristics to improve Compression in Columnar Database Systems." arXiv preprint arXiv:1609.07823 (2016).
http://arxiv.org/pdf/1609.07823v1

Annotating Sumerian: A LLOD-enhanced Workflow for Cuneiform Corpora

Published in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) on 2018

Assyriology, the discipline that studies cuneiform sources and their context, has enormous potential for the application of computational linguistics theory and method on account of the significant quantity of transcribed texts that are available in digital form but that remain as yet largely unexploited. As part of the Machine Translation and Automated Analysis of Cuneiform Languages project (https://cdli-gh.github.io/mtaac/), we aim to bring together corpus data, lexical data, linguistic annotations and object metadata in order to contribute to resolving data processing and integration challenges in the field of Assyriology as a whole, as well as for related fields of research such as linguistics and history. Data sparsity presents a challenge to our goal of the automated transliteration of the administrative texts of the Ur III period. To mitigate this situation we have undertaken to annotate the whole corpus. To this end we have developed an annotation pipeline to facilitate the annotation of our gold corpus. This toolset can be re-employed to annotate any Sumerian text and will be integrated into the Cuneiform Digital Library Initiative (https://cdli.ucla.edu) infrastructure. To share these new data, we have also mapped our data to existing LOD and LLOD ontologies and vocabularies. This article provides details on the processing of Sumerian linguistic data using our pipeline, from raw transliterations to rich and structured data in the form of (L)LOD. We describe the morphological and syntactic annotation, with a particular focus on the publication of our datasets as LOD. This application of LLOD in Assyriology is unique and involves the concept of a LLOD edition of a linguistically annotated corpus of Sumerian, as well as linking with lexical resources, repositories of annotation terminology, and finally the museum collections in which the artifacts bearing these inscribed texts are kept. Read more

Recommended citation: Chiarcos, C., Ilya Khait, Émilie Pagé-Perron, Niko Schenk, Jayanth and Lucas Reckling. "Annotating Sumerian: A LLOD-enhanced Workflow for Cuneiform Corpora." (2018).
http://lrec-conf.org/workshops/lrec2018/W23/pdf/12_W23.pdf

Annotating a low-resource language with LLOD technology: Sumerian morphology and syntax

Published in Multidisciplinary Digital Publishing Institute on 2018

This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian. Read more

Recommended citation: Chiarcos, Christian, Ilya Khait, Émilie Pagé-Perron, Niko Schenk, Jayanth, Christian Fäth, Julius Steuer, William Mcgrath, and Jinyan Wang. "Annotating a low-resource language with LLOD technology: Sumerian morphology and syntax." Information 9, no. 11 (2018): 290.
https://www.mdpi.com/2078-2489/9/11/290/pdf

Multiobjective Deep Learning

Published in University of California, Los Angeles on 2019

Many current challenges in natural language processing and computer vision have to deal with multiple objectives simultaneously. In this article, we study different methods to solve such multi-objective problem for CIFAR-100 and SEMEVAL datasets, and compare with traditional deep learning methods. The multi-output method achieves better results than training a single neural net from scratch with its own model for each objective. Multi-objective deep learning with weights achieves comparable results too. Read more

Recommended citation: Jayanth, Jayanth. Multiobjective Deep Learning. University of California, Los Angeles, 2019.
https://escholarship.org/content/qt3ww3r9m2/qt3ww3r9m2.pdf

ESMCrystal: Enhancing Protein Crystallization Prediction Through Protein Embeddings

Published in Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) on 2024

Protein crystallization is a critical yet challenging step in determining protein structures, crucial for advancing our understanding of biological mechanisms. This study introduces ESMCrystal, a novel approach leveraging protein embeddings derived from the advanced Meta ESMFold2 architecture to predict protein crystallization. By integrating transfer learning techniques, ESMCrystal models demonstrate enhanced predictive performance across various datasets, highlighting the potential of deep learning in structural biology. This research not only improves the predictability of protein crystallization but also sets the stage for broader applications of machine learning in understanding complex biological systems. The standalone source code and models, along with the inference server are available at https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1 and https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2. Read more

Recommended citation: Jayanth Kumar, Kavya Jayakumar and Jayaprakash Sundararaj. ESMCrystal: Enhancing Protein Crystallization Prediction Through Protein Embeddings. Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) 2024.
https://easychair.org/publications/preprint/FTCX