By Susannah Barnes
Published on 2022年7月7日
Alation launched the Data Intelligence Project in the summer of 2021 to train the next generation of data leaders. Today, we offer our software free of charge, to faculty and students at academic institutions, as a learning tool. With Alation, students learn the critical skills they need to curate, govern, and discover data assets in the data-driven enterprises of today.
But how did we arrive here? As I’ll share, the ideas that led to last year’s launch were germinating for years in two very different places.
The first was in Madison, Wisconsin. In 2013 I joined American Family Insurance as a metadata analyst. I was changing careers and had just completed a degree in Library and Information Science. I had always been fascinated by how people find, organize, and access information, so a metadata management role after school was a natural choice.
For the uninitiated, metadata is “data about data” – and it’s everywhere. Metadata captures the history of digital assets, with details like author, date created, and file size. Just as an Amazon product page conveys details about a consumer good for the curious consumer, so too a data catalog captures and presents metadata to aid human comprehension. Modern enterprises use metadata to make data assets easier to find, organize, and access, and American Family hired me to work with a team building metadata services.
I quickly realized that my education had prepared me in theory but not in practice. I had the theoretical knowledge I could apply to my work, but I had no practical experience to draw on to ease my transition into the workforce (working parents don’t always have the option for internships). The terms people used were identical to what I’d studied, but the meaning and context of the information was completely different. I spent the next six months taking notes, asking questions, and mastering a different kind of metadata and information management than what I had learned in school.
While I was learning on the job in Madison, Aaron Kalb was in the San Francisco Bay Area co-founding Alation. As Alation worked to create a new category of enterprise data management tool, the data catalog, Aaron wanted to also use this new technology to advance the cause of academic research. How? By providing Alation as a tool for data experts and researchers to access, share, and document high-quality public data sets and research findings across institutions. He even had a name for it: Alation Open.
However, Aaron faced two big hurdles. First of all, launching a startup is massively time consuming, and second, Alation Open was in need of a partner champion to get this program off the ground. It would take a few years before Aaron would be able to turn his attention back to Alation Open, and refocus his efforts on how to use Alation for social good.
Recently, Alation CEO Satyen Sagani spoke about the importance of trusted data and how Alation's Data Intelligence Platform helps organizations manage their data on Bloomberg Technology. Listen below to check out the insights.
In 2018, American Family Insurance became an Alation customer and I became the product owner for the AmFam catalog program.
By now, I was confident in my role. I knew right away that a tool as intuitive as Alation could be used in the classroom to give students hands-on experience with an enterprise-quality application. I pitched the idea to Alation and to the University of Wisconsin – Milwaukee (UWM), my alma mater where I occasionally guest lectured on metadata. Aaron agreed to speak with UWM, and the first pillar of Alation Open was initiated as the Data Intelligence Project. Aaron finally had the partner champions he needed.
Dr. Maria Haigh, a Fulbright scholar and teacher at UWM, worked with Deb Seys, our Senior Director of Learning and Communities at Alation, to create a lesson plan and tailor Alation for the classroom. Starting in the summer of 2020, students began using Alation to learn how to work with data and communicate around it effectively.
What began for me as a desire to have been better prepared for a corporate job was now a fully supported program at Alation. I joined Alation this past March as the Data Intelligence Project Lead to mature the Data Intelligence Project, growing its impact from a single school to one that supports multiple institutions.
The Data Intelligence Project develops the next generation of data leaders through student and faculty access to our data catalog and to data management experts. But who are these rising data leaders? What roles will they be filling, not just in data-driven businesses, but in a data driven-society? To answer these questions we need to look at how data roles within the job market have evolved, and how academic programs have changed to meet new workforce demands.
In the 2010s, the growing scope of the data landscape gave rise to a new profession: the data scientist. This new role, combined with the creation of data lakes and the increasing use of cloud services, created new employment opportunities in data analytics, data architecture, and data management.
Enterprises were collecting vast ecosystems of data, and began regarding them, for the first time, as worlds worthy of exploration. Who would uncover secrets from these unknown landscapes? The data scientist. In 2012 Davenport and Patil declared the data scientist was “The Sexiest Job of the 21st Century.” The profession was rapidly growing with little formal training or standards in place. Universities were only just beginning to plan formal academic data science programs, and the skills to be taught in those programs were still being identified.
We’ve made incredible progress. This year, there are more than 900 academic programs offering training in data science. LinkedIn’s 2020 Emerging Job Report lists Data Scientist at #3 with 37% annual growth. The job AI Specialist, which is closely related, is listed at #1 with 74% annual growth. The Bureau of Labor Statistics projects the job outlook for data scientists to grow 22% from 2020 to 2030.
It is clear that the need for data scientists and experts is not going away. Companies competing for data talent must demonstrate a commitment to building a modern data stack and to supporting a strong internal community of data professionals to attract top prospects.
The rapid growth of data roles critical to data-centric business models demonstrate an awareness of this need. Data engineer and cloud engineer both sit in the top 15 of LinkedIn’s 2020 Emerging Job Report, and Business Insider states, “seven of the top 10 industries that will grow most in-demand in 2022 require at least a baseline understanding of data, including AI specialist, data scientist, robotics engineer, full-stack engineer, cloud engineer, cybersecurity specialist, javascript developer.”
Demand for data stewards and data catalogers is increasing steadily, particularly in entry to mid-level roles, as companies build out robust data governance programs to support data analytics initiatives. Knowledge related to data cannot simply be collected in a common location- it requires the specialized skills of librarians and data stewards to add the descriptive metadata that makes data assets findable and usable, expediting the time to value for data products.
Academic programs have met the demand for these new roles by adding additional degree programs and certificates for students. Multiple universities have added M.S. degrees in Data Engineering over the past 2 years, while others have expanded existing M.S. and B.S. programs in Information Science and Data Analytics. Community colleges are adding data specialization IT tracks to support data-specific infrastructure and tool roles.
But how can these programs ensure that graduates enter new jobs fully prepared to hit the ground running? My own experience had revealed the painful gap between learning data in theory and using it in practice. How could we bridge that divide?
To maximize the value of organizational data, companies need to reduce the time it takes for data scientists and data analysts to find the data they need and put it to use. Currently, data analysts and data scientists spend up to 80% of their time cleaning and organizing data, leaving just 20% of their time for analysis. This significantly limits the time to value of data science and analytics projects.
Why is so much time wasted? A lack of data literacy slows down the process. As Kon Leong, co-founder and CEO of ZL technologies, points out, “the mission of enabling data analytics in today’s enterprise is hobbled by the lack of the requisite skills in the marketplace, including: advanced statistics/mathematics, new analytics methodologies, advanced systems analysis, business fundamentals, regulatory and legal understanding, and general IT technical and data architecture skills.”
Another limiting factor is that of context. Once data is found and cleaned, data scientists and analysts still need to understand the methods by which the data was collected, the limitations on proper use, and any other contextual information that may impact the insights derived from a particular data set. Data sets need to be properly curated and cataloged by data architects and subject matter experts in order to capture this critical information.
For data scientists and data analysts to create analytic insights with total context, information must be understood and managed in a centralized data intelligence platform like a data catalog. The information found in the data catalog brings together the knowledge and expertise of system specialists, data architects, data stewards, and data catalogers for the benefit of the analytic community.
A data ecosystem requires support and nurture. It takes numerous contributors working together and sharing knowledge in a centralized information space to avoid a complicated data ecosystem from collapsing. The data catalog is a common touchpoint for people consuming data insights, providing analyses, building data pipelines, and supporting the technical infrastructure of data systems and tooling. As such, it’s a natural learning environment.
The Data Intelligence Project not only introduces students to an enterprise data catalog tool, but also supports the teaching of data literacy across multiple data disciplines and academic training programs, so that students entering the workforce have an understanding of the full breadth and interconnectivity of a modern data stack.
With this program, Alation teaches students the language of data in theory and in practice, so when they graduate and go on to work in data-centric companies, they’re prepared to translate knowledge into action. Honestly, it’s the kind of program I wish had been available to me as a graduate student!
Data collection has exploded, and this poses both challenges and opportunities. The vast volumes of data created by IoT, web interactions, and digital applications have given rise to new, data-centric roles. Supporting the development of analytic insights at scale requires a data-literate workforce capable of understanding the complexity of modern data systems. Those who can analyze and communicate around data are essential actors for innovation in virtually every sector. We must equip our future generations with the skills to understand and use data with confidence, if we are to rise to meet humanity’s most vital obstacles, from cancer to climate change. The Data Intelligence Project’s purpose is to empower colleges and universities to do just that.
Do you want to bring the Data Intelligence Project to your institution? Apply today!