/ Insights / Snowflake / Kipi Life Science DataHub – Clinical LLM

Kipi Life Science DataHub – Clinical LLM


In the healthcare industry, the wealth of unstructured text data stored across disparate systems presents a significant challenge for extracting actionable insights vital for advancing medical research, improving patient care, and ensuring regulatory compliance. This data includes scientific literature, clinical notes, and regulatory documents, each containing valuable information critical to the success and safety of healthcare practices.

Unlock the power of Data


To address this challenge, a comprehensive text analytics solution tailored specifically for the healthcare sector can be implemented within Snowflake, a leading Data Cloud platform. Leveraging advanced natural language processing (NLP) techniques, such as Retrieval-Augmented Generation (RAG), this solution promises to autonomously extract and synthesize meaningful insights from unstructured text data.


RAG is particularly transformative as it not only retrieves relevant information but also generates responses based on the retrieved knowledge, significantly enhancing the depth and relevance of insights obtained from healthcare texts. This technology enables healthcare organizations to unlock the full potential of their unstructured data, accelerating medical breakthroughs, improving patient care, and ensuring regulatory compliance.

Here is a high-level architecture diagram of the solution:

Snowflake, with its advancements in the AI space and continuous releases, now has a native capability to support the Vectorization required of contextual text used for an RAG model, among others.

This solution leverages Cortex’s Vector Embedding function (now in public preview) to vectorize the chunks of documents and store them in a vectorized table in Snowflake. Cortex’s VECTOR COSINE SIMILARITY function is used to find relevant vectors, and Cortex’s COMPLETE function to generate responses based on the user-provided model to implement a RAG-based LLM chatbot.

We, at, have packaged up the entire functionality in a Snowflake Native App (launching soon on the Snowflake Marketplace) that  offers a seamless chatbot experience, eliminating the hassle of deployments and configurations required for installation and sharing.

The numbered icons in the diagram below represents the user interaction with the app.

  1. Data engineers upload relevant clinical documents on a Snowflake internal stage
  2. Snowpark UDF splits the documents into chunks and vectorizes them and stores them in a vectored table
  3. Snowflake vector table serves as the store for RAG
  4. Business Users (users of the application) provide the prompt to the LLM chatbot. They are also expected to provide their choice of LLM model , out of:
  • llama2-70b-chat
  • gemma-7b
  • mistral-7b
  • mixtral-8x7b
  • snowflake-arctic

And a depth value for the token for cosine similarity function.

  1. The Snowflake Cortex Cosine Similarity function parses the prompt 
  2. Finds relevant tokens from the context vectorized tables
  3. Relevant data is sent back to the next Cortex Complete function to provide a relevant response in the form of a natural language
  4. The generated response is sent back to the native app
  5. Finally, users are able to see the response on UI

Here is the working demo of the application


The impact of implementing such a solution is far-reaching and profound. By accelerating medical breakthroughs, healthcare organizations can identify emerging trends and discoveries more efficiently, leading to faster innovation and advancements in treatment options. Real-time insights derived from clinical notes and scientific literature can enhance patient care by providing clinicians with timely and relevant information to inform treatment decisions and interventions.

Ensuring regulatory compliance is paramount in the healthcare industry to mitigate risks and penalties. A text analytics solution powered by RAG can automate compliance efforts by extracting critical information from regulatory documents and providing actionable insights to support adherence to evolving regulatory requirements.

Potential Use Cases

Analyzing Clinical Notes: Analyzing clinical notes, scientific literature, research articles, and medical journals. This app can identify relevant studies, extract key findings, and facilitate evidence-based decision-making in medical research and practice.

Drug Safety Monitoring: Monitoring adverse drug reactions by analyzing medical records, patient reports, and pharmacovigilance data, enabling proactive intervention and regulatory reporting.

Social Media Intelligence: Monitoring disease outbreaks and epidemiological trends by analyzing data from sources such as social media, news articles, and public health reports. This app can aid in early detection, rapid response, and resource allocation during public health emergencies.

Clinical Trial Optimization: Optimizing clinical trial design and recruitment strategies by analyzing patient eligibility criteria, trial protocols, and recruitment materials, enabling the identification of eligible patients, prediction of enrollment rates, and optimization of trial outcomes.

Patient Sentiment Analysis: Analyzing patient feedback from surveys and online reviews to gain insights into patient satisfaction, sentiment, and preferences. Identifying areas for improvement, enhancing patient experience, and informing strategic decision-making.

Regulatory Compliance: Ensuring compliance with regulatory requirements such as HIPAA, GDPR, and FDA regulations by analyzing regulatory documents, policies, and guidelines, as well as interpreting complex regulations, identifying compliance gaps, and implementing corrective actions.

Healthcare Fraud Detection: Detecting healthcare fraud by analyzing claims data, billing records, and provider notes. Identifying suspicious patterns, anomalies, and fraudulent activities enables payers and regulators to mitigate financial losses and protect healthcare integrity.


Automated Literature Reviews: Expedite the process of conducting literature reviews by automating the extraction and analysis of relevant information from scientific publications, saving researchers valuable time and effort.

Clinical Literacy: Ability to understand and effectively navigate healthcare information, terminology, and processes, empowering individuals, including healthcare workers, to make informed decisions about patients health and engage meaningfully in their care.

Personalized Medicine: Tailor treatment plans and interventions based on insights derived from patient data and clinical notes, supporting the delivery of personalized healthcare to individual patients.

Regulatory Reporting: Streamline regulatory reporting processes by Identifying regulatory gaps, extracting and organizing relevant data for regulatory purposes, ensuring reductions in penalties.


Ultimately, the implementation of a text analytics solution within Snowflake empowers healthcare organizations to make informed decisions that positively impact patient outcomes, streamline operational processes, and strengthen regulatory adherence. By harnessing the power of advanced NLP technologies like RAG, healthcare providers can unlock the value hidden within their unstructured text data, paving the way for a more efficient and impactful healthcare ecosystem.

May 02, 2024