Datasaur is proud to be building a sovereign, private AI solution for Indonesia, the world's 4th most populous nation. The vast majority of training data for current LLMs are linguistically and culturally biased, and leveraging local, relevant data is important to democratizing access to this technology. "Korika Chat was developed through a collaboration between KORIKA (Indonesia AI Industry Research and Innovation Collaboration) and Datasaur AI, a global company specializing in the development of Large Language Model (LLM) platforms. Built on privacy-first principles and open-source architecture, KChat is ready to support various sectors, from state-owned enterprises (BUMN) and public institutions to MSMEs, in delivering efficient and inclusive digital services." https://lnkd.in/gN8trrQq
Datasaur
Software Development
San Francisco Bay Area, California 3,257 followers
Leading NLP Labeling and Private LLM Development Platform
About us
Datasaur builds Private LLMs for enterprise and governments. Leverage the best of LLM technology without sending any data off your servers.
- Website
-
http://www.datasaur.ai
External link for Datasaur
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco Bay Area, California
- Type
- Privately Held
- Founded
- 2019
Locations
-
Primary
San Francisco Bay Area, California, US
Employees at Datasaur
-
Ivan Lee
Ivan Lee is an Influencer Founder/CEO @ Datasaur | Private LLMs | LinkedIn Top Voice
-
Karol Danutama
VP of Engineering, Datasaur.ai (YC W20) - Data Labeling Software for NLP
-
Saripudin .
AI Engineer at Datasaur.ai (YC W20) - Data Labeling Software for NLP
-
Satrio Wicara Putra
AI Engineer at Datasaur
Updates
-
In our latest post, we define what a Private LLM truly means and why the distinction matters now more than ever. With OpenAI and Anthropic recently updating their data retention policies, it’s worth revisiting equally powerful options available to enterprises that need the power of modern LLMs without sending sensitive data to third parties. We also highlight how privacy is not a binary choice, but a spectrum. Each organization must calibrate its requirements across dimensions like data residency, governance, and deployment. Finally, we propose a practical framework for evaluating solutions, helping leaders cut through vague marketing claims and identify the right approach for their specific regulatory, security, and business needs. Read more here: https://lnkd.in/g766ypBU
-
The August LLM Scorecard is here — now featuring OpenAI's new open‑source model gpt-oss and Grok 4! We’ve just published our August 7, 2025 LLM Scorecard, ranking leading language models across Privacy, Quality, Cost, and Speed. Curious how the open model stacks up? Dive in and see the full comparison: https://lnkd.in/gC6gKZgx
-
July Feature Updates! We’ve been busy rolling out new tools to make labeling faster, smarter, and more intuitive. ✅ Labeling Agent: Configure multiple LLMs to label your data, with consensus 🔍 Smarter Search: Instantly find what you need in complex datasets ✏️ Editable Rows: Make quick in-line changes without switching views Read the full breakdown here: https://lnkd.in/gaCVCP-W #AI #DataLabeling #MachineLearning #NLP #LLMs #ML #Datasaur
-
We tested 4 top LLMs on a real-world labeling task. One crushed accuracy. One dominated speed. One… took 88 minutes? The full benchmark might surprise you. #LLM #AIbenchmark #DataLabeling #GPT4o #Claude #Gemini #LLaMA #Datasaur
-
Datasaur reposted this
We stopped just short of calling this feature "Vibe Labeling". But I'm very excited to see the release of Labeling Agents on Datasaur. Just as engineers start on Cursor or Claude Code, so should annotators start with an LLM for labeling. This flow is now built-in natively on the Datasaur platform, so OpenAI, Claude, and Llama can be the first pass on your annotation work. My favorite is having all 3 take a pass on the data, and a human only needs to review the areas where the three LLMs disagree.
We tested 4 of the top LLMs for labeling—and one result totally surprised us. At Datasaur, we're always asking: which model actually performs best for real-world labeling tasks? So we put them to the test. We ran Gemini 2.5 Pro, GPT-4o, Claude 3.7 Sonnet, and LLaMA 3.3 70B through a head-to-head comparison. We looked at: ✅ Accuracy ✅ Coverage ✅ Missed labels ✅ Processing time The winner? It wasn’t the one we expected. Read the quick and full breakdown and discover which model is best for your labeling needs: https://lnkd.in/g3_WgvY2 #LLM #AI #DataLabeling #Gemini #GPT4o #ClaudeAI #LLaMA #Automation #Datasaur #NLP #openai
-
Datasaur's annotation platform was recently used for a first-of-its-kind study conducted by Stanford School of Engineering computer science researchers on Sindhi, an Indo-Aryan language spoken by 40 million people. Despite its widespread use, Sindhi is considered "low-resource," meaning it has largely been left behind by rapid AI advancements benefiting other languages. Our platform’s ability to support all languages globally, including right-to-left and symbol-based scripts, aligns perfectly with our mission to democratize access to Natural Language Processing (NLP) for everyone.
-
We tested 4 of the top LLMs for labeling—and one result totally surprised us. At Datasaur, we're always asking: which model actually performs best for real-world labeling tasks? So we put them to the test. We ran Gemini 2.5 Pro, GPT-4o, Claude 3.7 Sonnet, and LLaMA 3.3 70B through a head-to-head comparison. We looked at: ✅ Accuracy ✅ Coverage ✅ Missed labels ✅ Processing time The winner? It wasn’t the one we expected. Read the quick and full breakdown and discover which model is best for your labeling needs: https://lnkd.in/g3_WgvY2 #LLM #AI #DataLabeling #Gemini #GPT4o #ClaudeAI #LLaMA #Automation #Datasaur #NLP #openai
-
Datasaur reposted this
👀 A rare glimpse of a day-in-the-life at Datasaur offices, in case you wanted to see who's building your Private LLMs. (Our project manager Hafezd El Daffa was just playing around with Veo, and I thought this was neat)