[go: up one dir, main page]

Introducing G2.ai, the future of software buying.Try now

Best Machine Learning Data Catalog Software

Shalaka Joshi
SJ
Researched and written by Shalaka Joshi

Machine learning data catalogs allow companies to categorize, access, interpret, and collaborate around company data across multiple data sources, while maintaining a high level of governance and access management. Artificial intelligence is key to many features of machine learning data catalogs, enabling functionality such as machine learning recommendations, natural language querying, and dynamic data masking for enhanced security purposes.

Companies can utilize machine learning data catalogs to maintain data sets in a single location so that searching for and discovering data is simple for everyday business users and analysts alike. Users have the ability to comment on, share, and recommend data sets so colleagues can have an immediate understanding of what they are querying. Additionally, IT administrators can put into place user provisioning to ensure unauthorized employees are not accessing sensitive data.

Machine learning data catalogs are most frequently implemented by companies that have multiple data sources, are searching for one source of truth, and are attempting to scale data usage company-wide. These products are generally administered by IT departments, who can maintain organization and security, but data can be accessed by data scientists or analysts and the average business user. The data can then be transformed, modeled, and visualized either directly in the machine learning data catalog or through an integration with business intelligence software.

It should be noted that not all machine learning data catalogs provide data preparation capabilities and may require an integration with a business intelligence platform. Additionally, these tools differ from master data management software due to their enhanced governance, collaboration, and machine learning functionality.

To qualify for inclusion in the Machine Learning Data Catalog category, a product must:

Organize and consolidate data from all company sources in a single repository
Provide user access management for security and data governance purposes
Allow business users to search and access the data from within the catalog
Offer collaboration features around data sets, including categorizing, commenting, and sharing
Give intelligent recommendations based on machine learning for quicker access to relevant data
Show More
Show Less

Featured Machine Learning Data Catalog Software At A Glance

Huwise
Sponsored
Leader:
Highest Performer:
Easiest to Use:
Top Trending:
Show LessShow More
Highest Performer:
Easiest to Use:
Top Trending:

G2 takes pride in showing unbiased reviews on user satisfaction in our ratings and reports. We do not allow paid placements in any of our ratings, rankings, or reports. Learn about our scoring methodologies.

No filters applied
87 Listings in Machine Learning Data Catalog Available
(118)4.5 out of 5
Optimized for quick response
4th Easiest To Use in Machine Learning Data Catalog software
Save to My Lists
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Built by a data team, for data teams, Atlan is THE Active Metadata platform for enterprises to find, trust, and govern AI-ready data, and a leader in The Forrester Wave™: Enterprise Data Catalogs, Q3

    Users
    No information available
    Industries
    • Financial Services
    • Information Technology and Services
    Market Segment
    • 54% Mid-Market
    • 40% Enterprise
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • Atlan Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    Ease of Use
    22
    Features
    16
    User Interface
    15
    Data Lineage
    14
    Customer Support
    11
    Cons
    Lacking Features
    6
    Limited Functionality
    6
    Data Lineage Issues
    5
    Integration Issues
    5
    Learning Curve
    5
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Atlan features and usability ratings that predict user satisfaction
    9.0
    Ease of Use
    Average: 8.6
    9.2
    Business and Data Glossary
    Average: 8.5
    9.3
    Metadata Management
    Average: 8.4
    9.3
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    Atlan
    Company Website
    Year Founded
    2019
    HQ Location
    New York, US
    Twitter
    @AtlanHQ
    9,589 Twitter followers
    LinkedIn® Page
    in.linkedin.com
    548 employees on LinkedIn®
Product Description
How are these determined?Information
This description is provided by the seller.

Built by a data team, for data teams, Atlan is THE Active Metadata platform for enterprises to find, trust, and govern AI-ready data, and a leader in The Forrester Wave™: Enterprise Data Catalogs, Q3

Users
No information available
Industries
  • Financial Services
  • Information Technology and Services
Market Segment
  • 54% Mid-Market
  • 40% Enterprise
Atlan Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
Ease of Use
22
Features
16
User Interface
15
Data Lineage
14
Customer Support
11
Cons
Lacking Features
6
Limited Functionality
6
Data Lineage Issues
5
Integration Issues
5
Learning Curve
5
Atlan features and usability ratings that predict user satisfaction
9.0
Ease of Use
Average: 8.6
9.2
Business and Data Glossary
Average: 8.5
9.3
Metadata Management
Average: 8.4
9.3
Data Lineage
Average: 8.6
Seller Details
Seller
Atlan
Company Website
Year Founded
2019
HQ Location
New York, US
Twitter
@AtlanHQ
9,589 Twitter followers
LinkedIn® Page
in.linkedin.com
548 employees on LinkedIn®
(194)4.3 out of 5
1st Easiest To Use in Machine Learning Data Catalog software
View top Consulting Services for AWS Glue
Save to My Lists
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    AWS Glue is a serverless data integration service that makes it easier for analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning, and app

    Users
    • Data Engineer
    • DevOps Engineer
    Industries
    • Information Technology and Services
    • Computer Software
    Market Segment
    • 48% Enterprise
    • 28% Mid-Market
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • AWS Glue Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    Ease of Use
    4
    Simple
    3
    Easy Integrations
    2
    ETL Solutions
    2
    Features
    2
    Cons
    Complex Transformations
    2
    Feature Limitations
    2
    Limited Functionality
    2
    Missing Features
    2
    Access Control
    1
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • AWS Glue features and usability ratings that predict user satisfaction
    8.4
    Ease of Use
    Average: 8.6
    8.9
    Business and Data Glossary
    Average: 8.5
    8.6
    Metadata Management
    Average: 8.4
    8.7
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Year Founded
    2006
    HQ Location
    Seattle, WA
    Twitter
    @awscloud
    2,234,689 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    143,584 employees on LinkedIn®
    Ownership
    NASDAQ: AMZN
Product Description
How are these determined?Information
This description is provided by the seller.

AWS Glue is a serverless data integration service that makes it easier for analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning, and app

Users
  • Data Engineer
  • DevOps Engineer
Industries
  • Information Technology and Services
  • Computer Software
Market Segment
  • 48% Enterprise
  • 28% Mid-Market
AWS Glue Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
Ease of Use
4
Simple
3
Easy Integrations
2
ETL Solutions
2
Features
2
Cons
Complex Transformations
2
Feature Limitations
2
Limited Functionality
2
Missing Features
2
Access Control
1
AWS Glue features and usability ratings that predict user satisfaction
8.4
Ease of Use
Average: 8.6
8.9
Business and Data Glossary
Average: 8.5
8.6
Metadata Management
Average: 8.4
8.7
Data Lineage
Average: 8.6
Seller Details
Year Founded
2006
HQ Location
Seattle, WA
Twitter
@awscloud
2,234,689 Twitter followers
LinkedIn® Page
www.linkedin.com
143,584 employees on LinkedIn®
Ownership
NASDAQ: AMZN

This is how G2 Deals can help you:

  • Easily shop for curated – and trusted – software
  • Own your own software buying journey
  • Discover exclusive deals on software
(28)4.4 out of 5
View top Consulting Services for Google Cloud Data Catalog
Save to My Lists
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    A fully managed and highly scalable data discovery and metadata management service.

    Users
    No information available
    Industries
    • Computer Software
    Market Segment
    • 46% Small-Business
    • 29% Mid-Market
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Google Cloud Data Catalog features and usability ratings that predict user satisfaction
    8.7
    Ease of Use
    Average: 8.6
    8.5
    Business and Data Glossary
    Average: 8.5
    9.1
    Metadata Management
    Average: 8.4
    7.8
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    Google
    Year Founded
    1998
    HQ Location
    Mountain View, CA
    Twitter
    @google
    32,788,922 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    316,397 employees on LinkedIn®
    Ownership
    NASDAQ:GOOG
Product Description
How are these determined?Information
This description is provided by the seller.

A fully managed and highly scalable data discovery and metadata management service.

Users
No information available
Industries
  • Computer Software
Market Segment
  • 46% Small-Business
  • 29% Mid-Market
Google Cloud Data Catalog features and usability ratings that predict user satisfaction
8.7
Ease of Use
Average: 8.6
8.5
Business and Data Glossary
Average: 8.5
9.1
Metadata Management
Average: 8.4
7.8
Data Lineage
Average: 8.6
Seller Details
Seller
Google
Year Founded
1998
HQ Location
Mountain View, CA
Twitter
@google
32,788,922 Twitter followers
LinkedIn® Page
www.linkedin.com
316,397 employees on LinkedIn®
Ownership
NASDAQ:GOOG
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Cloudera Navigator is a complete data governance solution for Hadoop, offering critical capabilities such as data discovery, continuous optimization, audit, lineage, metadata management, and policy en

    Users
    No information available
    Industries
    • Information Technology and Services
    Market Segment
    • 48% Enterprise
    • 38% Small-Business
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Cloudera Data Platform features and usability ratings that predict user satisfaction
    8.1
    Ease of Use
    Average: 8.6
    8.9
    Business and Data Glossary
    Average: 8.5
    9.1
    Metadata Management
    Average: 8.4
    8.8
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    Cloudera
    Year Founded
    2008
    HQ Location
    Palo Alto, CA
    Twitter
    @cloudera
    107,628 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    3,270 employees on LinkedIn®
    Phone
    888-789-1488
Product Description
How are these determined?Information
This description is provided by the seller.

Cloudera Navigator is a complete data governance solution for Hadoop, offering critical capabilities such as data discovery, continuous optimization, audit, lineage, metadata management, and policy en

Users
No information available
Industries
  • Information Technology and Services
Market Segment
  • 48% Enterprise
  • 38% Small-Business
Cloudera Data Platform features and usability ratings that predict user satisfaction
8.1
Ease of Use
Average: 8.6
8.9
Business and Data Glossary
Average: 8.5
9.1
Metadata Management
Average: 8.4
8.8
Data Lineage
Average: 8.6
Seller Details
Seller
Cloudera
Year Founded
2008
HQ Location
Palo Alto, CA
Twitter
@cloudera
107,628 Twitter followers
LinkedIn® Page
www.linkedin.com
3,270 employees on LinkedIn®
Phone
888-789-1488
(66)4.7 out of 5
Optimized for quick response
3rd Easiest To Use in Machine Learning Data Catalog software
Save to My Lists
Entry Level Price:Contact Us
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Sifflet is a comprehensive data observability solution designed to assist data engineers and data consumers in gaining complete visibility into their data stacks. This platform can be deployed as a So

    Users
    • Data Engineer
    Industries
    • Information Technology and Services
    • Financial Services
    Market Segment
    • 64% Mid-Market
    • 32% Enterprise
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • Sifflet Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    Monitoring
    19
    Data Lineage
    17
    Efficiency Improvement
    15
    Data Quality
    13
    Ease of Use
    13
    Cons
    Limited Integration
    8
    Complex Setup
    7
    Limited Customization
    7
    Lineage Issues
    6
    Alert Management
    4
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Sifflet features and usability ratings that predict user satisfaction
    9.2
    Ease of Use
    Average: 8.6
    0.0
    No information available
    0.0
    No information available
    0.0
    No information available
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    Sifflet
    Company Website
    Year Founded
    2021
    HQ Location
    Paris, Ile-de-France
    Twitter
    @Siffletdata
    390 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    44 employees on LinkedIn®
Product Description
How are these determined?Information
This description is provided by the seller.

Sifflet is a comprehensive data observability solution designed to assist data engineers and data consumers in gaining complete visibility into their data stacks. This platform can be deployed as a So

Users
  • Data Engineer
Industries
  • Information Technology and Services
  • Financial Services
Market Segment
  • 64% Mid-Market
  • 32% Enterprise
Sifflet Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
Monitoring
19
Data Lineage
17
Efficiency Improvement
15
Data Quality
13
Ease of Use
13
Cons
Limited Integration
8
Complex Setup
7
Limited Customization
7
Lineage Issues
6
Alert Management
4
Sifflet features and usability ratings that predict user satisfaction
9.2
Ease of Use
Average: 8.6
0.0
No information available
0.0
No information available
0.0
No information available
Seller Details
Seller
Sifflet
Company Website
Year Founded
2021
HQ Location
Paris, Ile-de-France
Twitter
@Siffletdata
390 Twitter followers
LinkedIn® Page
www.linkedin.com
44 employees on LinkedIn®
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Appen collects and labels images, text, speech, audio, video, and other data to create training data used to build and continuously improve the world’s most innovative artificial intelligence systems.

    Users
    No information available
    Industries
    • Information Technology and Services
    Market Segment
    • 58% Small-Business
    • 26% Mid-Market
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • Appen Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    Efficiency Improvement
    2
    Analytics
    1
    Customer Support
    1
    Customization
    1
    Data Accuracy
    1
    Cons
    Difficult Learning
    1
    Low Compensation
    1
    Work Interruptions
    1
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Appen features and usability ratings that predict user satisfaction
    8.1
    Ease of Use
    Average: 8.6
    8.2
    Business and Data Glossary
    Average: 8.5
    8.0
    Metadata Management
    Average: 8.4
    7.8
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    Appen
    Year Founded
    1996
    HQ Location
    Kirkland, Washington, United States
    LinkedIn® Page
    www.linkedin.com
    19,157 employees on LinkedIn®
    Ownership
    ASX:APX
    Total Revenue (USD mm)
    $244,900
Product Description
How are these determined?Information
This description is provided by the seller.

Appen collects and labels images, text, speech, audio, video, and other data to create training data used to build and continuously improve the world’s most innovative artificial intelligence systems.

Users
No information available
Industries
  • Information Technology and Services
Market Segment
  • 58% Small-Business
  • 26% Mid-Market
Appen Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
Efficiency Improvement
2
Analytics
1
Customer Support
1
Customization
1
Data Accuracy
1
Cons
Difficult Learning
1
Low Compensation
1
Work Interruptions
1
Appen features and usability ratings that predict user satisfaction
8.1
Ease of Use
Average: 8.6
8.2
Business and Data Glossary
Average: 8.5
8.0
Metadata Management
Average: 8.4
7.8
Data Lineage
Average: 8.6
Seller Details
Seller
Appen
Year Founded
1996
HQ Location
Kirkland, Washington, United States
LinkedIn® Page
www.linkedin.com
19,157 employees on LinkedIn®
Ownership
ASX:APX
Total Revenue (USD mm)
$244,900
Entry Level Price:Free
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Decube is the all-in-one Data Trust Platform designed for the modern data stack. Our mission is to make your data reliable, easily discoverable, and constantly monitored across your entire organizatio

    Users
    No information available
    Industries
    • Information Technology and Services
    Market Segment
    • 36% Small-Business
    • 32% Mid-Market
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • decube Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    User Interface
    5
    User Experience
    4
    UX Design
    4
    Data Cataloging
    3
    Ease of Use
    3
    Cons
    API Limitations
    1
    Connector Issues
    1
    Limited Functionality
    1
    Missing Features
    1
    Monitoring Issues
    1
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • decube features and usability ratings that predict user satisfaction
    9.4
    Ease of Use
    Average: 8.6
    9.7
    Business and Data Glossary
    Average: 8.5
    9.7
    Metadata Management
    Average: 8.4
    9.6
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Year Founded
    2022
    HQ Location
    Kuala Lumpur
    Twitter
    @decube_data
    113 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    40 employees on LinkedIn®
Product Description
How are these determined?Information
This description is provided by the seller.

Decube is the all-in-one Data Trust Platform designed for the modern data stack. Our mission is to make your data reliable, easily discoverable, and constantly monitored across your entire organizatio

Users
No information available
Industries
  • Information Technology and Services
Market Segment
  • 36% Small-Business
  • 32% Mid-Market
decube Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
User Interface
5
User Experience
4
UX Design
4
Data Cataloging
3
Ease of Use
3
Cons
API Limitations
1
Connector Issues
1
Limited Functionality
1
Missing Features
1
Monitoring Issues
1
decube features and usability ratings that predict user satisfaction
9.4
Ease of Use
Average: 8.6
9.7
Business and Data Glossary
Average: 8.5
9.7
Metadata Management
Average: 8.4
9.6
Data Lineage
Average: 8.6
Seller Details
Year Founded
2022
HQ Location
Kuala Lumpur
Twitter
@decube_data
113 Twitter followers
LinkedIn® Page
www.linkedin.com
40 employees on LinkedIn®
(55)4.5 out of 5
5th Easiest To Use in Machine Learning Data Catalog software
Save to My Lists
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Secoda is an AI-powered data governance platform designed to help organizations explore, understand, and utilize their data effectively. By providing a comprehensive platform that connects to 75+ data

    Users
    No information available
    Industries
    • Computer Software
    • Financial Services
    Market Segment
    • 65% Mid-Market
    • 18% Small-Business
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • Secoda Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    Ease of Use
    31
    Features
    25
    Customer Support
    21
    Data Lineage
    19
    Integrations
    16
    Cons
    Bug Issues
    11
    Bugs
    11
    Technical Issues
    9
    Learning Curve
    5
    Missing Features
    5
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Secoda features and usability ratings that predict user satisfaction
    8.2
    Ease of Use
    Average: 8.6
    9.3
    Business and Data Glossary
    Average: 8.5
    9.5
    Metadata Management
    Average: 8.4
    8.9
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    Secoda
    Company Website
    Year Founded
    2021
    HQ Location
    Toronto, CA
    Twitter
    @SecodaHQ
    934 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    50 employees on LinkedIn®
Product Description
How are these determined?Information
This description is provided by the seller.

Secoda is an AI-powered data governance platform designed to help organizations explore, understand, and utilize their data effectively. By providing a comprehensive platform that connects to 75+ data

Users
No information available
Industries
  • Computer Software
  • Financial Services
Market Segment
  • 65% Mid-Market
  • 18% Small-Business
Secoda Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
Ease of Use
31
Features
25
Customer Support
21
Data Lineage
19
Integrations
16
Cons
Bug Issues
11
Bugs
11
Technical Issues
9
Learning Curve
5
Missing Features
5
Secoda features and usability ratings that predict user satisfaction
8.2
Ease of Use
Average: 8.6
9.3
Business and Data Glossary
Average: 8.5
9.5
Metadata Management
Average: 8.4
8.9
Data Lineage
Average: 8.6
Seller Details
Seller
Secoda
Company Website
Year Founded
2021
HQ Location
Toronto, CA
Twitter
@SecodaHQ
934 Twitter followers
LinkedIn® Page
www.linkedin.com
50 employees on LinkedIn®
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Each entry in the dataset consists of a unique MP3 and corresponding text file. Many of the 1,368 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can hel

    Users
    No information available
    Industries
    No information available
    Market Segment
    • 64% Small-Business
    • 27% Mid-Market
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Common Voice dataset features and usability ratings that predict user satisfaction
    8.2
    Ease of Use
    Average: 8.6
    6.8
    Business and Data Glossary
    Average: 8.5
    8.2
    Metadata Management
    Average: 8.4
    6.8
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    Mozilla
    Year Founded
    2005
    HQ Location
    San Francisco, CA
    Twitter
    @mozilla
    269,532 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    1,755 employees on LinkedIn®
Product Description
How are these determined?Information
This description is provided by the seller.

Each entry in the dataset consists of a unique MP3 and corresponding text file. Many of the 1,368 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can hel

Users
No information available
Industries
No information available
Market Segment
  • 64% Small-Business
  • 27% Mid-Market
Common Voice dataset features and usability ratings that predict user satisfaction
8.2
Ease of Use
Average: 8.6
6.8
Business and Data Glossary
Average: 8.5
8.2
Metadata Management
Average: 8.4
6.8
Data Lineage
Average: 8.6
Seller Details
Seller
Mozilla
Year Founded
2005
HQ Location
San Francisco, CA
Twitter
@mozilla
269,532 Twitter followers
LinkedIn® Page
www.linkedin.com
1,755 employees on LinkedIn®
(97)4.2 out of 5
Optimized for quick response
7th Easiest To Use in Machine Learning Data Catalog software
View top Consulting Services for Collibra
Save to My Lists
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Try Collibra for free @ Collibra.com/tour Collibra is for organizations with complex data challenges, hybrid data ecosystems—and big ambitions for data and AI. We help organizations who are trying

    Users
    No information available
    Industries
    • Financial Services
    • Banking
    Market Segment
    • 73% Enterprise
    • 20% Mid-Market
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • Collibra Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    Features
    10
    Ease of Use
    9
    Integrations
    9
    Data Management
    8
    Data Governance
    7
    Cons
    Limited Functionality
    7
    Missing Features
    5
    Complexity Issues
    4
    Complex Setup
    4
    Improvement Needed
    4
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Collibra features and usability ratings that predict user satisfaction
    8.0
    Ease of Use
    Average: 8.6
    8.1
    Business and Data Glossary
    Average: 8.5
    7.7
    Metadata Management
    Average: 8.4
    7.7
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    Collibra
    Company Website
    Year Founded
    2008
    HQ Location
    New York, New York
    Twitter
    @collibra
    5,760 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    1,044 employees on LinkedIn®
Product Description
How are these determined?Information
This description is provided by the seller.

Try Collibra for free @ Collibra.com/tour Collibra is for organizations with complex data challenges, hybrid data ecosystems—and big ambitions for data and AI. We help organizations who are trying

Users
No information available
Industries
  • Financial Services
  • Banking
Market Segment
  • 73% Enterprise
  • 20% Mid-Market
Collibra Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
Features
10
Ease of Use
9
Integrations
9
Data Management
8
Data Governance
7
Cons
Limited Functionality
7
Missing Features
5
Complexity Issues
4
Complex Setup
4
Improvement Needed
4
Collibra features and usability ratings that predict user satisfaction
8.0
Ease of Use
Average: 8.6
8.1
Business and Data Glossary
Average: 8.5
7.7
Metadata Management
Average: 8.4
7.7
Data Lineage
Average: 8.6
Seller Details
Seller
Collibra
Company Website
Year Founded
2008
HQ Location
New York, New York
Twitter
@collibra
5,760 Twitter followers
LinkedIn® Page
www.linkedin.com
1,044 employees on LinkedIn®
(53)4.5 out of 5
6th Easiest To Use in Machine Learning Data Catalog software
Save to My Lists
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Select Star is a modern data governance platform that helps organizations manage and understand their data at scale, enabling AI, analytics, and self-service across the business. It automatically c

    Users
    No information available
    Industries
    • Information Technology and Services
    • Real Estate
    Market Segment
    • 51% Mid-Market
    • 38% Enterprise
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • Select Star Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    Data Lineage
    7
    Ease of Use
    5
    User Interface
    5
    Data Discovery
    4
    Insights
    3
    Cons
    Limited Functionality
    3
    Lineage Limitations
    2
    Complexity
    1
    Complex Setup
    1
    Difficult Learning
    1
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Select Star features and usability ratings that predict user satisfaction
    8.9
    Ease of Use
    Average: 8.6
    8.2
    Business and Data Glossary
    Average: 8.5
    8.7
    Metadata Management
    Average: 8.4
    8.9
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Year Founded
    2020
    HQ Location
    San Francisco, CA
    Twitter
    @selectstarhq
    388 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    24 employees on LinkedIn®
Product Description
How are these determined?Information
This description is provided by the seller.

Select Star is a modern data governance platform that helps organizations manage and understand their data at scale, enabling AI, analytics, and self-service across the business. It automatically c

Users
No information available
Industries
  • Information Technology and Services
  • Real Estate
Market Segment
  • 51% Mid-Market
  • 38% Enterprise
Select Star Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
Data Lineage
7
Ease of Use
5
User Interface
5
Data Discovery
4
Insights
3
Cons
Limited Functionality
3
Lineage Limitations
2
Complexity
1
Complex Setup
1
Difficult Learning
1
Select Star features and usability ratings that predict user satisfaction
8.9
Ease of Use
Average: 8.6
8.2
Business and Data Glossary
Average: 8.5
8.7
Metadata Management
Average: 8.4
8.9
Data Lineage
Average: 8.6
Seller Details
Year Founded
2020
HQ Location
San Francisco, CA
Twitter
@selectstarhq
388 Twitter followers
LinkedIn® Page
www.linkedin.com
24 employees on LinkedIn®
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    A machine-learning-based data catalog that allows to classify and organize data assets across cloud, on-premises, and big data. It provides maximum value and reuse of data across enterprise.

    Users
    No information available
    Industries
    • Information Technology and Services
    • Computer Software
    Market Segment
    • 53% Enterprise
    • 26% Mid-Market
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Informatica Enterprise Data Catalog features and usability ratings that predict user satisfaction
    7.8
    Ease of Use
    Average: 8.6
    7.7
    Business and Data Glossary
    Average: 8.5
    8.0
    Metadata Management
    Average: 8.4
    8.3
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Year Founded
    1993
    HQ Location
    Redwood City, CA
    Twitter
    @Informatica
    100,810 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    5,355 employees on LinkedIn®
    Ownership
    NYSE: INFA
Product Description
How are these determined?Information
This description is provided by the seller.

A machine-learning-based data catalog that allows to classify and organize data assets across cloud, on-premises, and big data. It provides maximum value and reuse of data across enterprise.

Users
No information available
Industries
  • Information Technology and Services
  • Computer Software
Market Segment
  • 53% Enterprise
  • 26% Mid-Market
Informatica Enterprise Data Catalog features and usability ratings that predict user satisfaction
7.8
Ease of Use
Average: 8.6
7.7
Business and Data Glossary
Average: 8.5
8.0
Metadata Management
Average: 8.4
8.3
Data Lineage
Average: 8.6
Seller Details
Year Founded
1993
HQ Location
Redwood City, CA
Twitter
@Informatica
100,810 Twitter followers
LinkedIn® Page
www.linkedin.com
5,355 employees on LinkedIn®
Ownership
NYSE: INFA
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    IBM Watson® Knowledge Catalog is a unified data catalog that can help your data users quickly find, curate, categorize and share data, analytical models and their relationships with other members of y

    Users
    No information available
    Industries
    No information available
    Market Segment
    • 42% Enterprise
    • 32% Small-Business
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • IBM Knowledge Catalog features and usability ratings that predict user satisfaction
    8.7
    Ease of Use
    Average: 8.6
    7.5
    Business and Data Glossary
    Average: 8.5
    7.5
    Metadata Management
    Average: 8.4
    8.3
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    IBM
    Year Founded
    1911
    HQ Location
    Armonk, NY
    Twitter
    @IBM
    714,643 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    328,966 employees on LinkedIn®
    Ownership
    SWX:IBM
Product Description
How are these determined?Information
This description is provided by the seller.

IBM Watson® Knowledge Catalog is a unified data catalog that can help your data users quickly find, curate, categorize and share data, analytical models and their relationships with other members of y

Users
No information available
Industries
No information available
Market Segment
  • 42% Enterprise
  • 32% Small-Business
IBM Knowledge Catalog features and usability ratings that predict user satisfaction
8.7
Ease of Use
Average: 8.6
7.5
Business and Data Glossary
Average: 8.5
7.5
Metadata Management
Average: 8.4
8.3
Data Lineage
Average: 8.6
Seller Details
Seller
IBM
Year Founded
1911
HQ Location
Armonk, NY
Twitter
@IBM
714,643 Twitter followers
LinkedIn® Page
www.linkedin.com
328,966 employees on LinkedIn®
Ownership
SWX:IBM
(12)4.2 out of 5
View top Consulting Services for data.world
Save to My Lists
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    data.world is the most-adopted data catalog and governance platform on the market. Built on a unique knowledge graph foundation, data.world seamlessly integrates with your existing systems. We set

    Users
    No information available
    Industries
    No information available
    Market Segment
    • 67% Small-Business
    • 25% Mid-Market
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • data.world Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    Ease of Use
    4
    Data Discovery
    2
    Data Visualization
    2
    Integrations
    2
    Analytics
    1
    Cons
    Data Duplication
    1
    Data Inaccuracy
    1
    Data Quality
    1
    Learning Curve
    1
    Missing Features
    1
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • data.world features and usability ratings that predict user satisfaction
    8.8
    Ease of Use
    Average: 8.6
    9.2
    Business and Data Glossary
    Average: 8.5
    8.8
    Metadata Management
    Average: 8.4
    9.3
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Company Website
    Year Founded
    2016
    HQ Location
    Austin, Texas
    Twitter
    @datadotworld
    5,572 Twitter followers
    LinkedIn® Page
    www.linkedin.com
    115 employees on LinkedIn®
Product Description
How are these determined?Information
This description is provided by the seller.

data.world is the most-adopted data catalog and governance platform on the market. Built on a unique knowledge graph foundation, data.world seamlessly integrates with your existing systems. We set

Users
No information available
Industries
No information available
Market Segment
  • 67% Small-Business
  • 25% Mid-Market
data.world Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
Ease of Use
4
Data Discovery
2
Data Visualization
2
Integrations
2
Analytics
1
Cons
Data Duplication
1
Data Inaccuracy
1
Data Quality
1
Learning Curve
1
Missing Features
1
data.world features and usability ratings that predict user satisfaction
8.8
Ease of Use
Average: 8.6
9.2
Business and Data Glossary
Average: 8.5
8.8
Metadata Management
Average: 8.4
9.3
Data Lineage
Average: 8.6
Seller Details
Company Website
Year Founded
2016
HQ Location
Austin, Texas
Twitter
@datadotworld
5,572 Twitter followers
LinkedIn® Page
www.linkedin.com
115 employees on LinkedIn®
(64)4.7 out of 5
2nd Easiest To Use in Machine Learning Data Catalog software
Save to My Lists
  • Overview
    Expand/Collapse Overview
  • Product Description
    How are these determined?Information
    This description is provided by the seller.

    Coalesce Catalog is a collaborative, automated data discovery & catalog tool. We believe that data people spend way too much time trying to find and understand their data. Coalesce Catalog

    Users
    No information available
    Industries
    • Information Technology and Services
    • Financial Services
    Market Segment
    • 59% Mid-Market
    • 27% Enterprise
  • Pros and Cons
    Expand/Collapse Pros and Cons
  • Coalesce Catalog (formerly CastorDoc) Pros and Cons
    How are these determined?Information
    Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
    Pros
    Collaboration
    2
    Ease of Use
    2
    Centralized Management
    1
    Connectivity
    1
    Data Governance
    1
    Cons
    This product has not yet received any negative sentiments.
  • User Satisfaction
    Expand/Collapse User Satisfaction
  • Coalesce Catalog (formerly CastorDoc) features and usability ratings that predict user satisfaction
    9.6
    Ease of Use
    Average: 8.6
    9.9
    Business and Data Glossary
    Average: 8.5
    9.9
    Metadata Management
    Average: 8.4
    9.9
    Data Lineage
    Average: 8.6
  • Seller Details
    Expand/Collapse Seller Details
  • Seller Details
    Seller
    Coalesce
    Company Website
    Year Founded
    2020
    HQ Location
    San Francisco, CA
    LinkedIn® Page
    www.linkedin.com
    139 employees on LinkedIn®
Product Description
How are these determined?Information
This description is provided by the seller.

Coalesce Catalog is a collaborative, automated data discovery & catalog tool. We believe that data people spend way too much time trying to find and understand their data. Coalesce Catalog

Users
No information available
Industries
  • Information Technology and Services
  • Financial Services
Market Segment
  • 59% Mid-Market
  • 27% Enterprise
Coalesce Catalog (formerly CastorDoc) Pros and Cons
How are these determined?Information
Pros and Cons are compiled from review feedback and grouped into themes to provide an easy-to-understand summary of user reviews.
Pros
Collaboration
2
Ease of Use
2
Centralized Management
1
Connectivity
1
Data Governance
1
Cons
This product has not yet received any negative sentiments.
Coalesce Catalog (formerly CastorDoc) features and usability ratings that predict user satisfaction
9.6
Ease of Use
Average: 8.6
9.9
Business and Data Glossary
Average: 8.5
9.9
Metadata Management
Average: 8.4
9.9
Data Lineage
Average: 8.6
Seller Details
Seller
Coalesce
Company Website
Year Founded
2020
HQ Location
San Francisco, CA
LinkedIn® Page
www.linkedin.com
139 employees on LinkedIn®

Learn More About Machine Learning Data Catalog Software

What is a Machine Learning Data Catalog?

Machine learning data catalog (MLDC) is an automated data catalog that carries out tasks like crawling metadata, cataloging, and classifying personally identifiable information (PII) data. Machine learning data catalogs organize the dataset inventory using metadata.

Data catalogs help companies know where the data is stored, thus reducing the time taken to identify data and making it easily accessible for analytics. They are inventories of assets like tables, schema, files, and charts in organizations, aiding in solving a company's data discovery, quality, and governance challenges.

What does MLDC Stand For?

MLDC is an acronym for Machine Learning Data Catalog. 

What are the Common Features of Machine Learning Data Catalogs?

Machine learning data catalogs simplify the manual functions of a data catalog. A data catalog is an essential part of the data management strategy of any organization. Some of the features of machine learning data catalogs are:

Data ingestion and discovery: Machine learning data catalogs must have prebuilt adapters to connect to different company systems like applications, databases, files, and external APIs. These adapters help in discovering metadata from systems. Metadata can be table names, attribute names, and constraints. The feature helps build native connectivity like integrations for data sources, business intelligence (BI) solutions, and data science tools.

Business glossary: Although a good amount of data is stored in the repository, it is also essential for the users to understand what the stored data means. The glossary feature links this data to business terms giving it more meaning. 

Automated data labeling: Data labeling is a prerequisite for machine learning algorithms. Automated data labeling is more accurate than manual since it eliminates human errors. Data labeling usually involves annotators identifying objects in images to build quality artificial intelligence (AI) training data. Automated labeling eliminates the challenges posed by the tedious annotation cycles.

Data lineage: Data lineage is the process that helps the users know who, why, when, and where changes are made to the data. It is a part of metadata management. MLDCs automate the data lineage process. Data lineage helps determine when new or changed data require retraining machine learning models. MLDCs usually parse through query logs into data lakes and other data sources automatically to create a data lineage map.

Data quality monitoring and anomaly detection: Data quality monitoring helps users understand if the data came from a trusted source. The machine learning data catalog also has a feature to identify sudden changes in data using machine learning algorithms. The users are immediately alerted to any changes or anomalies that are detected. 

Semantic search for data sets: Machine learning data catalogs provide users with visual and intuitive searches like search engines. Almost every user in any organization is a data user, but not everyone can use SQL queries to use data. The semantic search feature makes it easier for all users to discover data sets.

Compliance capabilities: This feature ensures that sensitive data is not exposed and that the user can trust the data. It further helps keep data governance policies in place and strengthen data management in the organization. Data stewards can identify low-quality data and restrict access to sensitive data, thus helping comply with regulations such as the General Data Protection Regulation (GDPR).

Data profiling: Data profiling helps check the data from the data source and collects information about it. This process helps in knowing data quality issues much better, thus making the data management process more efficient.

What are the Benefits of Machine Learning Data Catalogs?

A machine learning data catalog provides several benefits to different types of users in the organization. These include:

Ease in data curation: Data curation is a process of collecting, organizing, labeling, and cleaning data. Machine learning data catalogs validate metadata and organize insights into correct repositories using machine learning algorithms.

Ease of search: Because of semantic search, it becomes easier for non-technical users to search and discover data for use since they do not have to use SQL queries every time to access data.

Ease in data collaboration: Machine learning data catalogs help the users collaborate, use, and share data sets because machine learning data catalogs ease finding and storing siloed data.

Who Uses Machine Learning Data Catalogs?

Machine learning data catalogs centralize metadata for various data assets. By organizing the metadata, MLDCs help organizations to govern data access.

Data analysts: Data analysts use MLDC to discover, classify, and manipulate data for their analytics processes. They can also discover AI or machine learning models, understand how they work, and import them into their BI tools. Data catalogs help data analysts make companies into self-service organizations. Self-service analytics is important for any organization that wants to be driven by insights. Machine learning data catalogs help the users know the means to find, understand, and trust data.

Marketers: Marketing teams use the machine learning data catalog more commercially. They obtain insights for making better decisions using data catalogs.

Data scientists: Data scientists usually publish their models for reuse. Data scientists always look for one platform that centralizes data for different projects. 

Challenges with Machine Learning Data Catalogs

Although machine learning data catalogs help solve major challenges in traditional data catalogs like data discovery and data lineage, MLDCs also come with challenges.  

Scalability: It is tricky for all MLDCs to support a huge metadata volume. Sometimes, the data catalogs break down due to performance issues when overloaded with enormous amounts of metadata. Initially, data used to be stored in the company's mainframe data center. However, due to today's big data, machine learning data catalogs must keep track of data in both cloud and data lakes.

Fragmentation in evaluating a product: If a data catalog is too bulky, it causes fragmentation in the user's journey of evaluating a product. Too much data makes users use too many tools, thus breaking a seamless experience into fragments.

How to Buy Machine Learning Data Catalogs

Requirements Gathering (RFI/RFP) for Machine Learning Data Catalogs

The machine learning data catalog offers many features to help users identify usable data. A buyer can choose the right MLDC software depending on the organization's needs. RFP/RFIs help the organization look for pricing, product features, and guidelines.

Compare Machine Learning Data Catalog Products

Create a long list

The first step is to look for all the possible players in the space. This gives an advantage of evaluating the vendors for the price, product features, and customer service. 

Create a short list

After evaluating the potential vendors, the company can narrow the list to those who check all their boxes.

Conduct demos

Demos help in understanding the product as a whole. A team of IT professionals and data scientists should join these demos to understand the product's functionality, whereas the marketing team can join in to analyze the business use of the software in the projects.

Selection of Machine Learning Data Catalogs

Choose a selection team

A team of marketing professionals with data scientists and IT professionals can communicate any queries related to the MLDC product with the vendors. A data scientist would be more interested in knowing the technical features of the software. A marketing manager would be curious to know how the marketing team could use MLDC for any project. An IT professional would want to understand the software installation procedure.

Negotiation

Once the vendor quotes the price, the negotiations begin. The price is fixed based on the cost of other similar products available in the market and the extent to which the product can solve the challenges.

Final decision

The final decision is based on agreements between the vendor and the buyer.