As longstanding partners, we sat down with the SciBite team to learn more about the role they see AI playing in semantic technology, how our partnership has evolved over the years, and what companies can do today to bring ontologies into their drug discovery programs.

Tell us an interesting fact about SciBite that people may not know.

An intriguing facet of SciBite that might not be widely known is the remarkable depth and breadth of experience within our Ontologies Team. Comprising a wealth of collective knowledge and expertise, this team boasts an astonishing total of over 100 years of experience in the field. What truly sets them apart is their substantial contributions to some of the most critical public ontologies and resources in the scientific community.

Members of the Ontologies Team have previously held pivotal roles in the development and maintenance of essential resources like the Gene Ontology (GO), Experimental Factor Ontology (EFO), Cell Ontology, UniProt, and ChEMBL. Their dedication to advancing science and promoting data interoperability is evident through their work in collective curation, ensuring that these ontologies remain up-to-date and accurate for the research community.

This wealth of experience not only reflects the dedication of the Ontologies Team but also highlights SciBite’s commitment to delivering top-notch solutions to support scientific research. Their deep-rooted involvement in shaping these fundamental resources underscores their pivotal role in advancing the field of ontologies and data integration. It’s a testament to SciBite’s enduring dedication to fostering innovation and promoting the use of ontologies for knowledge discovery in the ever-evolving landscape of scientific research.

AI is the topic on everyone’s minds, and we know it brings tremendous potential to the life sciences industry. Where do you see AI playing new roles in semantic technology in the coming years?

AI will undoubtedly change the life sciences industry, and its integration with semantic technology promises a bright future filled with innovation and efficiency. AI will play a pivotal role in both advancing semantic technology and supporting its use on multiple fronts in the coming years.

Developing ontologies and terminology in the life sciences, can be a time-consuming task, requiring subject matter experts to curate and manage ontologies to support the structuring of data. Having well organized and structured data is a cornerstone to effective use of any data and is often overlooked when considering implementation of AI. AI will be instrumental in data integration and interoperability and will likely augment and expediate the curation and mapping work being done by subject matter experts.

AI-augmented natural language processing (NLP) is revolutionizing how we search and interact with our data. However, context, sentiment and nuances of scientific literature still presents challenges for deploying AI-augmented NLP for accurate search and querying of life-science data. Combining ontological-based semantic search with Large Language Models (LLMs) can be synergistic, overcoming shortcomings of the respective technologies. Ontological-based semantic search can be used to accurately identify relevant documents that contain the entities of interest and the potential answer to a query. These documents are then sent to the LLM for formulation of an answer, reducing the search space, providing context, and reducing the chances of unwanted effects such as hallucinations, where an LLM will produce a grammatically correct, but factually incorrect answer. This process of combining ontological-based semantic search with an LLM is called Retrieval Augmented Generated Retrieval (RAG) and you can read more about this on SciBite’s recent blog post.

Data provenance is important for many use cases in the life sciences, such as pharmacovigilance or target identification. Researchers need to know why a particular article was returned by search, why an article was not returned and the basis for formulation of an answer in the case of generative AI. In these cases, it is much easier to explain a human curated ontology than a large model that is trained on a wide variety of sometimes irrelevant content. A RAG based approach could also be applied to these types of use case, with answers derived from a small set of relevant documents that were fed to an LLM and with scientific entities captured based on ontologies and therefore subject matter experts.

Lastly, AI-driven predictive analytics will become increasingly crucial in semantic technology. Machine learning (ML) algorithms can analyze vast datasets to identify emerging trends, predict potential research directions, and suggest novel hypotheses. By assisting researchers in identifying promising avenues for exploration, AI will accelerate the pace of discovery in the life sciences, ultimately leading to breakthroughs in drug development, personalized medicine, and more.

What advice would you give to emerging life sciences companies looking to utilize an ontology-led approach to drug discovery? What’s a good starting place?

If you’re an emerging life sciences company interested in using ontologies for drug discovery, here’s some practical advice to get started:

  • Focus on communicating outcome: This is something that came through loud and clear at BioTechX – communicate value not process, i.e., FAIR data is a means to the end, but what is the end, and how will this benefit us?
  • Understand ontologies: First, understand what ontologies are and why they matter. Think of them as structured frameworks that help organize complex scientific data. Knowing this basic concept is crucial.
  • Know your goals: Figure out your specific needs for both the content and structure of the data and the questions you are looking to address Do you need help with data integration, finding relevant information, or sharing knowledge efficiently? Knowing your goals will guide your ontology development.
  • Collaborate and learn: Don’t go it alone. Collaborate with experts who know what ontology support there is available in the public domain and where these public ontologies will need to be augmented to support your data. Depending on how ontologies will be applied to your data, these ontologies may also need to be optimized by sub setting ontologies into more manageable chunks, handling ambiguous terms and adding synonyms.
  • Build on existing work: You don’t have to start from scratch. There are many public ontologies available in the life sciences that represent a consensus view of the field. Adopting these public ontologies is a great starting point, saving time vs developing something from scratch, but leveraging the public identifiers contained within these public ontologies will improve data interoperability.

In short, learning the basics, setting clear goals, collaborating, and leveraging existing ontologies will give your life sciences company a strong start in using ontologies for drug discovery. It’s a practical way to streamline your research efforts and stay competitive in the field.

Earlier this year, SciBite launched a new tool Workbench to take the effort out of tabular data curation. Can you tell us why this technology is so important and what kinds of challenges it can solve?

The launch of SciBite’s new tool, Workbench, represents a significant milestone in data curation, addressing critical challenges that have long plagued researchers and organizations. Workbench is a game-changer because it simplifies and streamlines the often tedious and time-consuming process of curating tabular data. This technology is vital for several reasons.

Firstly, tabular data is ubiquitous across the life sciences and other industries. Researchers and data scientists often work with spreadsheets filled with valuable information. Still, the process of extracting, transforming, and loading this data into a format suitable for analysis can be incredibly complex and error-prone. Workbench automates these steps, reducing the potential for human errors and significantly accelerating the data curation process.

Secondly, the tool promotes data consistency and standardization. Maintaining consistency in how data is curated and structured is crucial in the life sciences, where data accuracy is paramount. Workbench ensures that data is curated following best practices and conforms to industry standards and ontologies. This consistency is vital for meaningful data integration and analysis across different datasets and sources.

Additionally, Workbench enhances data accessibility and collaboration. Providing a user-friendly interface and automating many of the data curation tasks empowers a broader range of team members to contribute to the process. This democratization of data curation fosters cross-functional collaboration and accelerates organizational decision-making processes.

SciBite has been a longstanding partner of CCC. Over these years, what do you think has changed the most in terms of the combined value our partnership brings to customers?

Over the years, the SciBite and CCC partnership has seen exciting changes that greatly benefit our mutual customers. A key to the success of this partnership has been the fusion of SciBite’s technical expertise and semantic technology with CCC’s content and copyright management capabilities to provide customers with a powerful search experience over copyright compliant content. We have also seen a shift in the knowledge management groups and an appreciation of the benefits that semantic search provides.

One of the most significant shifts has been the adoption of semantically enriched machine readable data. This tech fusion has supercharged CCC’s solutions, enabling customers to dig deeper into scientific data and make smarter decisions. The blend of SciBite’s text analytics and CCC’s content management expertise has created a dynamic system that not only helps users find content but also provides rich context for those in fields like life sciences.

Another noteworthy change is the partnership’s ability to adapt to customer needs. We have expanded our services to offer more comprehensive solutions that cover the entire research process. Whether it’s content discovery, data integration, ontology-driven drug discovery, or data curation, the partnership has evolved to tackle the ever-shifting challenges customers face. This means users now have a one-stop shop for managing knowledge and making well-informed decisions.

SciBite and CCC’s long-lasting partnership has transformed to offer a more holistic and data-driven approach to knowledge management. This evolution helps customers stay ahead in their industries, highlighting the flexibility and dedication of both organizations to innovate and deliver exceptional value to their users.

Topic:

Author: Christine McCarty

Christine Wyman McCarty is Product Marketing Director for corporate solutions at CCC. Through over a decade of experience working with clients at R&D intensive companies, she has gained an understanding of the challenges they face in finding, accessing, and deriving insight from published content. She draws on this expertise to shape innovative product offerings that solve market problems. Christine has held a variety of positions at CCC including roles in software implementation and product management. Christine has a Masters in Library and Information Science from Simmons University and practiced librarianship for several years before finding her passion for helping companies digitalize their knowledge workflows with software.