Video: January Town Hall | Duration: 3047s | Summary: January Town Hall | Chapters: Welcome and Introduction (0.9110000000000014s), DataHub Cloud Trial (172.201s), AI Integration Updates (1201.211s), Agent Context Kit (1700.596s), Data Hub Cloud Features (2132.826s), Expanding Workflow Support (2244.586s), AI Plugins Integration (2376.381s), Debugging Data Quality (2447.766s), Plugin Availability Details (2862.771s), Future Improvements Roadmap (2967.7309999999998s), Conclusion and Thanks (3027.361s)
Transcript for "January Town Hall":
Hello, everyone, and welcome to the first Data Hub town hall of twenty twenty six. Before we dive in, I see there's already some activity in the chat. We'd love to hear from you. Where are you joining us from today? It's always so cool to see how far and wide this community reaches. This session is being recorded so you can watch on demand and revisit all the awesome topics we're covering today. If you have any questions, put them in the chat or the QA section, and we'll do our very best to answer them. If you're new here, welcome. Town halls are all about sharing what's next, learning from real implementations and connecting with the people building cool stuff in DataHub. And a giant thank you to everyone who made Townhall part of their routine. It means a lot. And to everyone here championing open source, this community exists because of you. And if you haven't already, join the conversation in our Slack community. You can either use the QR code on the screen if you're new, or there's a link in the docs section next to the chat. We also have a dedicated town hall Slack channel where Shashanko will be sharing some of some of his, favorite memes. So please be sure to join us over there. And if you, you know, you noticed on social or you're seeing a new look, might have caught a glimpse of something different in our branding, You are right. As Data Hub becomes a context layer for enterprise AI, our logos evolve to reflect that next chapter. So while staying true to the open source community, we wanna make sure that, we give a nod to what got us here. This is so exciting. I have watched the team work so hard to come up with something that's crisp, something that's precise, and yet something that's not too far from, you know, our roots and the identity that all of you are very familiar with. So I'm excited for this mini evolution, small step for the logo, big step for data hub. Maybe we can make that joke here. But most importantly, I'm excited because Jen will finally get some new swag, and, hopefully, the community can enjoy that too. But. I'm excited for this new chapter. Slack. Alright. Now for the agenda. First off, Maggie will walk us through a very exciting announcement around free trial and how to access. From there, Shashanka, Das, John Joyce, Nick Adams will continue the conversation from our last town hall, and we'll double click into the context graph, agent context toolkit, and we'll see a preview of Ask Data Hub plugins. Without further ado, we will turn things over to Maggie. We hope you have a great town hall. Thanks, everyone. Thank you so much, Jen, and, lovely to see everybody. If we haven't met before, I'm Maggie. I'm the founding product manager over at Data Hub. Happy New Year to everybody. I'm super excited to talk you through, what we've recently rolled out and how you can start getting, start using it. So excuse me. Just a couple weeks ago, we started, or we rolled out, we rolled out the ability to start your own free trial with Data Hub Cloud. So, I have a QR code up here. I have a a link that you can go out to. But, basically, all you have to do is go to our, go to datahub.com slash get, data hub cloud, fill out a form, and we will give you a free instance for twenty one days to Data Cloud. So, this is super exciting because it's, you know, one of it's the first time that we're making it super easy to get started with Data Hub Cloud, get a sense of, you know, what's supported, how you can interact with it. And so what we're gonna do now is actually walk you through what that experience looks like. So let me switch over to my other screen. Give me just a moment, please. And off we go. Alright. So what we're looking at here is, a very live instance of a, Data Hub cloud trial. So if you've used Data Hub before, you may notice that the the home screen looks a little bit different. So since this is a free trial and, you know, you're not kind of interacting directly with folks from the data hub team, we wanted to make it really easy for people to jump right in, start getting familiar with, what data hub has to offer, and kinda give you a road map during that trial. So you'll notice on, kind of the homepage, we have a task list here. And right up top, we also call out that we have, sample data loaded in. So you can really just jump in and start kind of understanding how, datasets or glossary terms or, your tags or, you know, kind of how lineage is represented. So we have kind of an imaginary ecommerce company in there with, metadata loaded and, ready for you to go. In the getting started step, we wanna help people get familiar with, you know, lineage and data hub, get familiar with ask data hub, which is our AI assistant. And then if you actually wanna go beyond that free or the the sample data, you can absolutely start ingesting your own metadata as well. So what we will do is we'll actually take a quick look at our explorer data lineage so you can get a sense of, you know, what that sample data looks like. And so in here, we have the sample ordered, order details dataset in Snowflake, powered by, powered by DBT. And you can see kind of the full upstream and downstream, lineage of it all. You can start to explore it as you would in any other, you know, kind of lineage scenario and see that calm level lineage as well. So it really just helps you get familiar with what that lineage experience looks like. You can also come in here and start looking at, you know, the example dataset, see its descriptions, tags associated with it, etcetera. For folks who are familiar with DataHub already, if you're actually if you already have DataHub running or or excuse me, specifically DataHub cloud, You'll also notice over here that you can now interact directly with Asset Data Hub in the in the entity kind of context and ask it about, you know, really anything you'd like. How is this used downstream, upstream? We also and John's gonna be talking a lot about the the kind of recent improvements in Ask Data Hub. So I'm not gonna go into too much detail on this one. But I did wanna call that we now have this this, embedded AI context directly in here. I also just realized that I have Grammarly turned on here, so that's gonna be a little bit annoying. But we'll just keep moving on. Alright. So, if we go back to our homepage, we'll see that our we are now progressing towards our, our kind of checklist here. So next is up next up is to go interact with Acid Data Hub. So if we go into this experience, just for the sake of, kind of expediency, I I did a sample one here. So let's say that, you know, you're brand new to data hub, you don't really know what what the sample data looks like. You can actually just ask ask data hub, I'm new here. What do I do? And so, you know, this will give you an overview of the type of metadata that's in here. You can, you know, start or kinda navigate directly into specific samples, etcetera. But, you know, this is just it's the same as as any other kind of chat agent that you're or AI agent that you're already working with and maybe another tools. But this is a really great way for you during that free trial experience to just get familiar with that data. Additionally, you can always go through our kind of main search and browse experience. So you'll notice that in this free trial or sorry. In the sample data, we have represented, assets across across kind of, the, you know, standard tech stack, right, or data stack. So Spark, Snowflake, Tableau for your your BI layer, etcetera, etcetera. The other thing that I wanna call out is when or one of the things that we have rolled out in our in the free trial along with free trials, along with kind of the recent update with DataHub Cloud, but also coming to coming to open source is a a very kind of redesigned ingestion management experience. So let's say that you're ready to I'm actually gonna go back here. Let's say you're ready to start connecting your first source, in the free trial or maybe, you know, maybe you're you're already running with data hub and and you wanna go connect a new source. You will notice that this experience looks a little bit different if if it's something you've seen before. So when you go into select source, we've kind of modernized or cleaned up the experience to choose your ingestion flow or choose your ingestion source. Excuse me. And so we have it grouped by, you know, entity sorry, source types. So a BI layer, data like data warehouse, etcetera, etcetera. We also still have, the ability to create, custom sources in here, so that's that's included as well. And then we've also kind of collapsed some, some of the less popular or less adopted ones. So, anyway, everything is still in here. It just makes it much easier for you to navigate and, jump right in. So, I actually started on a, a sample ingestion flow to really kinda point out, what this experience looks like. So before we had if you've seen this before, the kind of redesign, we had kind of a modal experience and a pop up. We've really simplified and streamlined all of that so you can kind of, you know, just give your your ingestion source a name, add an owner or multiple owners if you'd like to, and then start configuring it. In the free trial and also in Data Hub cloud, we make Ask Data Hub available within this context as well. So, let's say, you know, we're we're running through, in this example, we're running through Snowflake connector, and maybe we wanna understand you know, maybe we don't know where to get the account ID or maybe the you know, kind of understanding the the credentials in here. So we're say, you know, what? Well, let's do how can I find my Snowflake account info? And so Ask Data Hub is is going to give you or is going to look at your specific instance, the current kind of current state of your, of your recipe. So if you started making progress towards it, and it's going to so all this is trained off of the actual code implementation of data hub. So it's and then also, of course, you know, an understanding of of the various sources. So this is gonna tell you exactly, you know, where to go find it. And then you can kinda come in here and start configuring it. Of course, I've I've already done that. The other thing is that, you know, when when teams are setting up specifically or particularly with with data warehouses, you know, oftentimes, there's just kind of storage patterns that that teams put in place or or kinda naming conventions about the type of of dataset in there, the type of, you know, kind of its purpose. So it could be that you want to kind of narrow down your ingestion to include or perhaps or, you know, maybe more explicitly exclude datasets or ask schemas that are in maybe kind of a staging like a staging or a temporary folder, maybe a personal you know, maybe your teammates have kind of their own personal development schemas and you don't wanna ingest that because it's maybe gonna cause a little bit of noise. So this is a way that you can apply your asset filters. Maybe we'll say that we wanna exclude any tables that are matching, or, you know, that are matching some pattern. This actually takes in Regex. And so maybe you're not a Regex kind of pro. And so let's say, you'll give me the Regex to filter out any we'll say take oops, tables that contain text test in them. And so, again, this is our our goal here is to really just minimize the amount of times that you need to navigate away from data hub to kind of get your ingestion source up and running. The other thing while this is kinda thinking, oh, that one that one very nice and fast. So this is gonna say this is gonna give you two different well, in this scenario, it gave us kind of two different ways to look at it. If you are if you are actually filling this out in as a a YAML, so this is gonna be where you can kind of configure the more advanced scenarios. Just wanna call that, you know, it's gonna give you that that kind of that YAML base that you can kinda copy and paste in there. But what we're gonna do is, you know, look at this variation, say any, table starting with test, any any, tables that are ending in test. So I'm gonna go ahead and just grab this one, and we will add this into our filter here. So you can add as many or as few as you want. You can delete them, change them, etcetera. The other thing is that, the other thing to kinda call out here is we have elevated up a lot of our common, our common settings within within an ingestion source. And so you we're gonna what we're gonna do is show you the kind of features or settings that are enabled by default. And let's say you don't you're not familiar with, you know, why you would use column profiling or example or for example. And so we can use ask data hub and say, what happens if I configure column? Be helpful if I could spell profiling. And so, again, this is this is gonna be looking at kind of looking at this is specifically in the context of the source that you're working with. So it's not just generally, you know, what's, why would I enable this in DataHub. It's gonna look at, you know, kind of treat it as, specific to the source that you have. And then it's, you know, giving us a really nice example of our current setting, and there's no table statistics being being used, what it's gonna bring back. So it's if you enable it, it's gonna collect your per column statistics, which is no account, etcetera, and also giving you the trade offs. Right? So you have better data quality, helps you identify issues. Of course, it because you're profiling every single, column in your your connective or your connection, excuse me, you can you're gonna see some some slower ingestion. And then it also gives you a recommended approach that you can kinda fine tune with that YAML experience. So we're gonna go ahead and click next. This is where you can configure your sync schedule. You can always just run these manually or, you know, on a one off basis, but by default, we're gonna, you know, kind of give that a a schedule to run with. And let's go ahead and save and run this bad boy. So this screen here is not gonna look, you know, any different to to what you've been seeing. You can still kind of access the full run history against all across all of your sources. Of course, now we play the waiting game, and hopefully, returns the, hopefully, this returns as I was, or as we were hoping. So we can go ahead. Uh-huh. Okay. So now when we when we're looking at our sources, of course, this is a little bit delayed. This is the fun of a of a a live demo, isn't it? But, lo and behold, our initial sync, failed. So if we go in here, you'll you'll see the same kind of run details output that you've seen in the past. It's just now kind of in a full page experience. What is what I'm personally extremely excited about is we also bring Asset Data Hub in to make it really easy for you to understand what's happening. So if you still wanna see the raw the raw logs, of course, you can, you know, kinda skim through it here. You can download the full logs if you'd like to. You can navigate over to the logs tab, etcetera. But let's be real. Like, logs can get really long, and and it can be hard to kinda find exactly what's going on. So instead of doing that, you know, based on the outcome of that run, we're gonna we're gonna we're gonna explain or sorry. We're gonna surface up some kind of quick some quick prompts for you so you can really jump on into it. And then what Ask Data Hub is doing in the scenario is actually looking at, again, your your specific configurations, and then it's it's reading in all of those logs and nailing down exactly, you know, exactly what happened and presenting that back to you in plain language. So right now, this is, you know, kind of presenting back the plan to us. I will point out that we have some, configurations coming in, an upcoming release for both the the trials and also Data Hub Cloud where you can have kind of a fast mode. So you get kind of, you know, more you get quicker results, particularly, you know, where we have kind of these, large log files that can take a little bit longer. So we will have some some faster resolution there. So here, we have the here's what happened. Here's what happened. The main this is the main error is authentication. Right? So this is saying incorrect username or password was specified. As DataHub is making a guess here about, you know, maybe this shouldn't have had a double t in it, the reality is is this isn't a valid user. Right? So for the sake of this exercise, I just wanted a really simple or for the sake of this demo, I wanted just a really simple, scenario. But, ultimately, what's happening is it's scanning through the entirety of of those logs. It's, you know, basing it on, for things that are a little bit more nuanced or a little bit more kind of edge case about how metadata is just it is ingested, excuse me, into data hub, it's going to give you very precise kind of syn synopsis of of what's going on here. And then, of course, you can continue the conversation. How do I resolve this? How do I move forward? But ultimately, you know, what you have here is instead of needing to go out to our doc site, instead of needing to download all of those logs, we're really trying to make it very, very easy to identify those issues, interpret what's happening, and act on it. So this is while I'm explaining this in kind of the context of the free trial itself or the the free trial announcement, that that experience of having the, you know, ask Data Hub and the ingestion creation and and and excuse me. The experience of having Ask Data Hub in the recipe creation and then kind of interpreting run outcomes, That is available for all data hub cloud deployments. And something that I'm I'm just very excited to kinda see how how folks navigate with it. We're gonna be making some improvements to kinda navigation and interacting with it. So excited for folks to kinda take a spin on it. I'm gonna stop sharing in just a second here. But yeah. So super excited to get folks using using that experience both in free trials, but also Data Hub Cloud. Honestly, it's something that I was I was excite or kind of thinking about or hoping that we could build out around the middle of last year. So it's very, very, very exciting to me that we have this in there because we wanna make it as easy as possible to people for people to get up and running with ingestion and and get things moving. So I will go ahead and share this one more time so we can kind of show the QR code for people to scan. So, yeah, free trials are up and ready to go. We're excited for people to get started. You're welcome to reach out to me anytime if you have any questions. But otherwise, from there, I think I will hand it on back to back to Jen, and we'll carry on with the rest of the show. Thanks, folks. Hello? Hi. I am not Jen, but Jen thinks I should go up next. So I'm here to talk about everything that we are doing around AI and agentic futures. Thank you, Maggie, for that amazing demo. It's always cool how the data hub team loves doing live demos on screen, and we usually pull it off. So kudos to yet another example of that. With that, I'm gonna share my screen and talk about what we're cooking up with AI. Here we go. Alright. So a big thing, that we just started is a community initiative around, bringing agents and related assets to the metadata model. So this is, obviously something that's near and dear to my heart as well as a lot of folks, within the team, but also within the community at large. Over the last, you know, six months or so, we've had a lot of, folks in the community, ask about, hey. Can I register my agents into DataHub, or can I register some MCP tools into DataHub? And a lot of the questions and problems tend to be around, you know, these, new, AI assets and agentic assets that are being deployed in companies, don't really have a unified catalog or a catalog of any sorts. And people are a little worried about not understanding what kind of sprawl is being created and sometimes worrying about discoverability of those assets themselves. Maybe there are a lot of very interesting agents that have been built in the company, but people don't know about it, and so they can't actually take advantage of it. And then, of course, the usual things that the story that we've seen with data, we already see that story kind of playing out with AI agents as well, understanding trust, understanding quality, and then, of course, understanding lineage. You know, if you have lineage that is disconnected from data lineage, then it's doesn't tell the full story. So fresh off the press, opened up an RFC for input from the community. The it's, you know, a GitHub PR like all our RFCs are. So please comment on it. Let us know what you think. Do AI agents make sense in DataHub's metadata model? What else makes sense? So far, we've got ideas for agents, agent tools or MCP tools, support for vector stores, well as for skills. But, you know, we wanna listen to the community and understand what makes sense. This is, of course, gonna be a longer term project, but we're starting kind of engaging with the community early because we want your input. There are few folks like Grab and a couple other folks in the community or companies in the community that are already giving us input. So join, join the community, if you wanna hang out in the AI channel and chat about all things AI and metadata, you're welcome to join us there. With that, I'm gonna move over to a few updates. You know, at the previous town hall, we talked about our support for context graph, essentially, ability to bring in unstructured context into Data Hub's metadata graph. And you guys are all experts at deploying DataHub and rolling out DataHub within your companies, and you probably have seen that understanding data requires context. And when I say context here, I mean business context, technical context, a lot of it is, you know, recorded in unstructured format. Some of it lives in disconnected tools like Notion or Confluence or other kind of internal, Wiki like, tooling. DataHub, of course, launched its native version of context docs recently to bridge that gap so that if DataHub is all you have, you can easily write any kind of documentation right there, and you don't necessarily have to be in the dataset entity page to add documentation. You can write about anything you want really that relates to how you run your data practice. But we also wanted to bring in context that lives in other places because you probably have a ton of docs living elsewhere. And with that, I'm happy to announce that a bunch of features that we promised are landing. The first and and most maybe platform wise exciting thing is support for semantic search. Unstructured context, as you know, tends to be long, textual, and, of course, also multimodal. We are starting with text context first, bringing in support for semantic search within the data hub back end. How it works is that as docs get ingested, they're also chunked and embedded, and we are rolling out support for a few embedding providers as part of, one dot four release. AWS, Bedrock, this is what we run-in our cloud systems as well, but we'll also, add support for OpenAI and Cohere, because those are kind of popular, embedding providers that most folks use. And it's very easy to produce these things. We'll, of course, have injection connectors that'll help you learn how to produce, semantic embeddings for, these documents. And, hopefully, that helps you basically bring in unstructured context on your own, as well as take advantage of the out of the box connectors that we provide. How will you consume these, the the context that we produce? Right now, we're bringing in semantic search support to our API surface area, so that's GraphQL, of course, and then our MCP tools. And, of course, for the cloud version, it's integrated into Ask Data Hub. So when you and we'll see some demos later around how we can take advantage of all of this context brought into the same place. Docs.datahub.com is your source of truth for all the content, and someday, we might index that into our demo data hub instance as well. And happy to announce that the Notion connector that we demoed in the previous town hall is actually available and will be part of the 1.4.o release. Support status incubating, this means the community can try it out. We we have tested it, to our, level of happiness, but don't create production dependencies on it. We will, mature it over the next two to three months and get into certified state. What does it do? Like Maggie showed, you go to the ingestion page. It just shows up as a document source, and you go through the configuration page, set it up. And this is a screenshot from Ask Data Hub, but, hey, Nick is gonna talk about a bunch of other agent integrations. So you can ask open source data hub as well about questions about all this unstructured data from Notion that you're bringing in, and it'll be able to answer those questions. Confluence is also a connector that if you wanna launch as part of one dot four, We're gonna open the PR right after the town hall. It's gonna be in testing phase, which means we are also actively testing it as we build it. So if you have a Confluence instance and you wanna help make the Confluence connector even better, jump in and help us. But, you know, we're gonna make it be incubating status in a month and then testing and generally available in over three to four months. Same story here, shows up as a source, configure it, ingest, and then next thing you know, you can ask questions that go across your technical context, your business context, as well as your, operational metadata. Awesome. So that's all I had. Just a quick update around promises we made to the community around bringing in context, from unstructured sources into the graph. We are shipping those. One dot four is gonna be the vehicle where all of this is gonna be available. We had a little bit of a delay to this week on the release, but we're working hard to get it out early next week to y'all. With that, I will transition over to Nick who's gonna talk about agent context kit. Hey, everyone. I'm Nick. I'm gonna talk about agent context kit as well as, give an update on some of the work we've done with with MCP. We previously talked about agent context kit in the last town hall and, wanted to give an update on the the stuff that's available going to be available in the latest open source release. So agent context kit is a suite of AI optimized tools to help bring DataHub context to any agent platform. It's used directly in our DataHub MCP server. It can be used, with the 1.4 o release with LangChain agents, which we built, helped build support for, Snowflake agents, and we have more more to come. And we'll have we have examples and, guides in the the repo, and, all of our this includes all of our new tools that we built for, document search, as part of the the, like, Notion integration, all of our existing tools for querying lineage. We have new tools for being able to edit metadata that's included in in the the new setup. So let's talk about the ways that you can use, agent context kit to help build, an agent with with DataHub. And so in terms of capabilities, let's start. You can integrate it with you could start by integrating the MCP server with with Claude, and start interacting with DataHub, search across your data, semantically search, unstructured documents, understand the impact. And this is sort of, like, the easiest, simplest way to to get started, but you can also build this using using our tools for for Snowflake and land chains. Well, let's let's talk about Snowflake. So we wanted to build a better experience for Text to SQL and improve and have improved data analysis for for inside Snowflake. Snowflake today doesn't doesn't support directly integrating with with our MCP server, and so we had to build some integration with DataHub to be able to pull that data into Snowflake so that in a Snowflake Cortex agent, you're able to access date both DataHub data and Snowflake queries at the the same time without having to go to another tool just to bring DataHub data to to your use case where you where you are. And so a little demo here. We we've one of the things we've built is basically a as part of our, Data Hub s c CLI, we've built a, agent create for Snowflake, and other platforms will will be added to this that basically allows you to generate a Snowflake Cortex agent with all of the data hub tools as UDFs in Snowflake so that you can run the, run queries in Snowflake Cortex. And it has two modes. So the first mode is you can generate the SQL files that you run-in the Snowflake CLI or Snowflake UI as an admin to set the agent up. And the second mode is we can directly execute the CLI on your computer can directly execute the SQL either using a Snowflake personal access token or opening a browser for SSO to be able to connect to Snowflake directly to give you sort of a one click experience to, create a Snowflake Cortex agent with with, Data Hub. And then I'll give a demo of how how that works here. And so an example here is we ran a query, with our fictional bank to show me all loans with property value of zero or no. These have invalid, long term value calculations. So this this question requires both lookups in DataHub in order to understand the schema and tables as well as to execute SQL in Snowflake to to get the answer. And so it starts by searching for relevant datasets in DataHub, and then it's able to find the fact loan details table and fetches the schema to verify the property fields. It's also able to look up documentation from Notion or Confluence to be able to to be able to find the I wanted to stop that video. Sorry. To be able to find the okay. Great. To be able to find the, the the the the documentation about the status, and then it wants to, execute Snowflake SQL at the end to, be able to determine which which values have, incorrect incorrect data. And so it's able to determine that there are two loans that have invalid data, and then we have docs, we'll have docs available for how to set this up and use this. Next, I wanna talk a little bit about, building Langchain agents. They can hallucinate, make up business logic, and so we wanted to change that by setting up Data Hub our Data Hub SDK to be able to plug that into your Langchain agent directly. And so here's an example, where we have a basic Langchain agent with the Data Hub agent context SDK where we asked it what went wrong with adding customer ID to monthly aggregations. And it was able to look up the documentation in Notion and Confluence in order to figure out that there's an issue with the aggregation monthly loans table due to row count explosion that happened, in January 2024, and it's able to sort of query the tables and determine the the row explosion giving incorrect sums and the number of rows before and after, with three times the difference. And so this is just an example of one of the ways that we're able to plug in our data directly with with Langchain. And we have this library is available. It is super easy to integrate with Langchain. You just have to enable it with your data hub client and build the Langchain tools and then, create your agent to be able to have the tools directly directly involved, and then we have some docs here, available as part of the agent agent context kit. And then lastly, I just wanna go over sort of availability of of everything. So Snowflake Intelligence should be available in 1.4.o. LangChain available in one point four dot o. We're gonna have Google 80 k, OpenAI, CrewAI, popular frameworks coming soon. And then, all of this is also available in the open source and cloud MCP servers as part of, 1.4.o release. And I will hand this off to to John to talk about Ask Data Hub plugins. Thank you, everyone. Awesome. Thank you, Nick. Super excited to see how we're trying to help you get DataHub context into all of your agent tools. I'm gonna share my screen. Nick, do you mind stopping the screen share? Thank you. Sorry about that. Yep. right. No worries. Hopefully everyone can see. All right. I'm gonna switch gears and talk a little bit about Data Hub Cloud and what we've been working on over the past few weeks and what's coming soon. Last time we talked about Ask Data Hub, which is our chat assistant that's built directly into Data Hub Cloud. Maggie covered this in ingestion and in the entity profile, but essentially this helps you to find the right data, understand the impact of data changes using lineage, answer business questions, manage context ownership, and it's available across three surface areas: Data Hub, Slack, and Teams. As of the last release of Data Hub Cloud v zero three sixteen, we've also introduced the document capability, which is what Sri Shankar covered earlier. The ability to create and manage context documents, which can be runbooks, FAQs, definitions, things that span multiple pieces of data. You can integrate documents from external data sources. Notion, Confluence are the two that were demoed earlier. You can make context accessible to agents via semantic search and Data Hub's MCP server. You can document and, track the changes of of the documents over time. You can provide fine grained access control. And then you can organize and link documents to data assets, business terms, domains, and more. Now what we found, in rolling out ask data hub is that, you know, you're able to achieve workflows that depend on the metadata in data hub. So change management, finding the right data, But really what we're looking to do is expand our horizon to support more end to end workflows for the data practitioner. And the way we believe we need to do that is by pulling together context that spans across different tools and technologies that, you know, data scientists, data engineers, data analysts are using on a day to day basis. And so a few of the use cases and workflows we've been trying to think about in recent weeks are, you know, debugging metric incidents and root cause analysis, which generally require that you don't only look at Data Hub or your, you know, data quality tool, but you also issue some sample or triaging queries against a platform like Snowflake. Maybe you investigate, you know, job logs in something like DBT or Airflow, and then you check PRs and GitHub, to understand if something has changed recently. So, you know, a workflow that fundamentally spans multiple tools and technologies. We've been looking at Text to SQL and answering operational business questions in plain English where you have to find the right tables, maybe using Data Hub. You have to construct and execute queries. You have to synthesize the results, maybe using something like Snowflake or Databricks. And then a use case like handling GDPR deletion requests, where you have to understand your internal policy around how you should handle GDPR takeout requests, Find the tables, columns, documents that contain user IDs or user information that needs to be cleaned. Construct queries to go actually find that data and do something with it, typically removing it or obfuscating it, which would be in a platform like Snowflake. And so over the last few weeks, we've been thinking about how can we evolve our Ask Data Hub product to help data engineers, data analysts, data scientists kind of run these end to end workflows with Ask Data Hub. And so what we ended up building is a way to bring context and capabilities into Ask Data Hub directly so that we can unify context across these different tools and help you answer those deeper questions or complete those deeper workflows. So what we've added is a feature called AI plugins, which allows you to bring in GitHub, DBT, Snowflake, Glean context, and more to ask Data Hub, which will hopefully enable you to run those workflows that span tools like data debugging, text to SQL, and GDPR, related, use cases as well. This, plugin system supports any MCP compatible server or tool, so you can also bring in your own context if you've created MCP servers, and we can make those available to Ask Data Hub as well. And we're secure by default with support for OAuth two point zero so that every user links their own credentials to GitHub, to Snowflake, to DBT, and is only able to query the data or see the things that they're allowed to see in those tools. And so now I want to just give a quick demo to showcase the power of kind of bringing all this context into one place. We're And going to look at an example where we're trying to debug a data quality issue after we're finding the right data to use. So this is a full end to end workflow. First, I'm trying to understand which table to use to understand total loan value originated at my bank. Again, we're kind of using that same example of a bank. And so DataHub's gonna go and try to use the metadata to find the right table. It says, Hey, we've got this Ag monthly loans table you can use. And I'm going to ask, Is this table healthy? Am I able to trust this table right now? And you can see that DataHub is actually telling me, Well, it's partially healthy. It's actually failing an assertion right now. We expect between zero and thirty rows, and actually there's more rows. And it just started failing. So this is now turning into a of a debugging use case where I want to understand what's wrong with the table. So I'm going to ask Data Hub, Hey, can you tell me if something recently changed in the code? And did the DBT model build successfully last time? Because we know this is a DBT model. And so what Data Hub's going to do is it's going to go out and it's going to search across context in GitHub and DBT. And it's going to come back and say, Actually, yes, something did recently change and it introduced a bug. And it's going to specifically refer to a PR that was merged on January 27 that added new column to the table, which introduced a join fanout bug. It'll even summarize what the problem was for us here. And it's going to give me an example and everything as well. You can see it also checks DBT and it can see that, yes, the DBT model did build. So this was a logic bug. It wasn't a bug that broke the job run. And now I'm gonna go a step further and say, Hey, ask Data Hub. Can you raise a PR to actually just revert that change and fix this bug? And DataHub's gonna go out using the GitHub integration and create a PR to fix the bug. And so you can see I can go over to GitHub where I have a PR here. It's actually raised in my name because I connected my own personal GitHub account. And it's going to fix it. And I'm just going to go in and LGTM ship it. Oh, looks like I can't. And now I'm going to just ask DataHub one more question. Hey, can you just link me to the original PR that caused the problem? And this is the original PR. You can see we got Mr. Sofrito guy here who added this logic, which was just simply incorrect. It was a bug, and it ended up causing this fan out and caused this data quality assertion to trip. And now I'm just gonna show you the experience around, you know, configuring these plug ins. You'll see as a user, once my admin has connected to Snowflake or GitHub, or DBT or allowed me to connect, I'll actually see a new tab called my integrations where I can turn on the connection for these different tools and instantly ask Data Hub will be able to go out and access the context across these different tools on my behalf. Awesome. So yeah, just a quick recap. What we just saw: Ask Data Hub plugins is a way to unify context and tools from all different types of data platforms: GitHub, Snowflake, DBT, Glean, and more. Now I wanna just quickly talk about how you actually set up a plugin in DataHub. This is a process that'll have to be initiated by your DataHub admin, and it's kind of a three step process. So admins will first go and get the MCP server URL and API key or OAuth client configurations from a third party app like Snowflake, like DBT, like GitHub. And I'll show you what it looks like in Snowflake in just a minute. And then the admin will create a plugin in DataHub settings. There's a new page, integrations, and then AI plugins where you can actually configure these things. And then the end user, finally, will see these in their personal integration settings where they can connect their personal account to those different tools. After that, you'll be able to start asking Data Hub questions that span across these different tools. I'm gonna quickly demo what this looks like for Snowflake. So to create an MCP server in Snowflake, we're actually gonna go into Snowflake as an admin and just run a few commands. We're gonna create the MCP server first. I think it kind of went a little bit quick there. We're gonna create the MCP server first. And then we're gonna have to create what's called an OAuth integration in Snowflake. That's where we're gonna get our credentials to plug into DataHub. And then we're gonna list them out here so that we can copy them into DataHub. There's a few things we're gonna need: client ID, client secret, and a few other things. All of these can be found right in Snowflake and we're gonna share a doc that shows you exactly how to get this information. At the end of this process, this is kind of the set of things that you're gonna get from Snowflake. You're gonna get a URL to connect to it. You're gonna get a client ID. You're gonna get a client secret, an authorization URL, and a token URL. So these five things are what you need to go into Data Hub to set up the MCP plugin. So we'll go back into Data Hub and we'll just show the process of creating an AI plugin. So here I'm just gonna say, Yes, I want to connect to the Snowflake fiction bank database. And I'm just gonna paste in all of the stuff that I found here in Snowflake. So that's gonna be the URL, the client ID, the client secret, etcetera. So let's just do that quickly here. Okay. We've got a few last things we have to do, like specify the scopes that we want for a user, refresh token, and then we want them to use the DataHub role. And then finally, we're gonna be able to provide custom instructions for this specific plugin. So this informs Ask Data Hub about how it should use the plugin. In this case, we're going say use this plugin to perform SQL queries against the Fiction Bank database in Snowflake. So once I do that, users will be able to connect their account to Snowflake using OAuth. That's what we've just done here. And then they'll be able to go into Ask Data Hub and they'll be able to ask Text to SQL questions. So in this case, I'm going to ask: What was the total loan value originated in 2023? Which requires us to understand the right table to use to answer that question using all of the rich metadata inside of Data Hub and then go into Snowflake and actually execute the query, an aggregation query, and then give us the result. So you can see it's gonna tell me the total loan value originated in 2023 was $8,100,000 And here's how it was calculated. All right. So what we're gonna be rolling out over the next few weeks is documentation that helps you to configure this for various partners. Snowflake, DBT, GitHub, Glean are kind of the few that we're really targeting in the first few weeks here. We've also linked all of the partner documentation. So this will link you to the Snowflake guide about how to create an MCP server, which you can then plug into Data Hub when you're setting up the plugin. All right. When will this be available? The big question. Ask DataHub plugins will be available in private beta in the next cloud release, which is targeted for February, early March. It's gonna be tested with GitHub, Snowflake, DBT Cloud, and more. It's gonna have OAuth two point zero support so that users can link their own accounts as we saw with Snowflake. It's gonna support shared API keys. So if you want all users to share one API token for a given plugin, that'll also be supported. And it also will support personal API keys. So if you wanna require that every user generates an API token from DVT and puts it in there, you'll be able to do that. HTTP and SSE and WebSockets will be the protocols we support for MCP. This is compliant with MCP's spec as of November. You'll be able to plug in custom headers and custom plug in instructions to support your unique MCP server. And then we'll have admin and user configurability as we just showed in the demo. And if you're interested in learning more about MCP plugins, AI plugins, please do reach out to your DataHub rep. We'd be happy to walk you through it and get you started. Now I'll just take one last minute here to talk about the road ahead on this particular set of features. We're continuing to evolve and improve the Ask DataHub agent experience. We're working to implement fine grained tool configurations so that users can enable and disable specific tools from those MCP plug ins. We're working towards tool call audit logging dashboard so that admins can understand who's doing what when. Tool call approve or deny so that you can actually approve every single tool call or deny every single tool call if you'd like. And then we're exploring support for the latest and greatest part of MCP protocol, which is called MCP apps, which actually allows you to render inline applications, that users can interact with right inside the chat experience. So super excited for all of the great things that are on the on the road map and coming ahead. I'm really excited to get some feedback from the customer base on this particular set of features. All right. With that, I think we're gonna end a little bit early. We may have some time for Q and A. Thank you guys for all the attention. This is maybe the first time I can ever remember in DataHub history that we've finished seven minutes early. Really happy about that. Thanks, everybody.