Technology

The rapid accumulation of experimental data presents both opportunities and challenges for scientific research. While vast datasets enable deeper insights and evidence-based discoveries, they also create obstacles such as information overload, data reliability issues, and normalization and integration difficulties. The sheer volume of scientific information exceeds human processing capacity, making interpretation increasingly complex.

LLMs address the information overflow problem by deep-learning the knowledge from massive amounts of text data, but their summarizing and synthesis capabilities come with significant limitations. LLMs are enormously computationally intensive, lack transparency in tracing information sources, and may generate unreliable outputs, particularly when dealing with scarce information.

Our technological approach overcomes these barriers by splitting the knowledge processing and learning into three distinct phases: entity recognition, natural language processing, and understanding the underlying knowledge model. The result is a significantly faster, more robust and interpretable system capable of finding and summarizing relevant information quickly and efficiently.

Below are the key components and highlights of our technology.

ER

Entity recognition

  • Based on proprietary normalized biomedical taxonomy covering millions of entities, including gene/protein names, diseases, biological processes, cells, tissues, organs, chemicals, medical procedures, etc.
  • Taxonomies are compiled and cross-linked from public taxonomies and ontologies
  • Taxonomies undergo multiple rounds of cleaning and enrichment to ensure highly detection accuracy
  • Highly accurate and efficient algorithm for matching and marking taxonomy entities in the text corpora
NLP

Natural Language Processing

  • Proprietary lexicon, grammar and fast deterministic parsing algorithm
  • Lightning-fast text processing up to 10,000 of sentences per second
  • Sentences are split into Subject-Verb-Object (SVO) triplets
  • Triplets capture grammatical structure of Subject and Object, further characterized by a number of linguistic properties
  • SVO Triplets form the building blocks for information summarization
  • Triplets are traceable back to the source documents and specific sentences
  • Triplets are additionally compressed into short token fingerprints/signatures representing essence of their meaning (relationship)
Knowledge

Knowledge learning

  • Knowledge model is defined as a set of possible relationships between different types of entities
  • Individual triplets describe specific types of relationship
  • Knowledge model is learned from corpora by clustering similar triplets connecting unique individual pairs of entities
  • Clustering procedure can be guided by domain experts
Search

Indexing and searching

  • Triplets are indexed using proprietary indexing engine
  • S/V/O parts of triplets are indexed separately that allows construction of complex queries and retrieval of information with high accuracy
  • User queries are transformed into S/V/O-indexes search plan to find and retrieve relevant triplets
Summarization

Summarization and Categorization

  • Learned knowledge model rules are applied to retrieved triplets to categorize them into topics
  • Linguistic properties of triplets are used for sorting and filtering to identify the most relevant ones
  • Summary is constructed from triplets that belong to specific topics

To learn more about AI Evolution Labs please email us at: