Name: Mahika Vajpeyi
Class Year: 2021
Major: Economics and Computer Science
Hometown: Ghaziabad, India
Internship Organization: Tresata
Job Title: Data Engineering Intern
Location: N/A (remote)
For someone studying economics and computer science, data science presents the perfect opportunity to combine programming skills with qualitative and quantitative skills taught in the social sciences. Since I had little experience in the field, I decided to spend this summer taking a deep dive into data and the science behind it. That’s how I landed at Tresata.
Tresata is an ‘AI’tomation company that provides its AI-powered software to other companies so they can monetize their data and automate intelligence to better serve their customers and ultimately further Tresata’s mission to "enrich life"®. However, before companies can begin using Tresata software, their data need to be engineered, that is, converted into a format that Tresata software can work with. It is this task of data engineering that I worked on this summer. As a data engineering intern, I spent the first two weeks learning Apache Zeppelin (tool for data analysis and visualization), Scala and SQL (programming languages for data manipulation), and fundamentals of Hadoop (software system to efficiently store and process massive amounts of data). Next, I collaborated with 12 other interns to create a repository of 800-plus COVID-19 datasets, as well as evaluated and curated select datasets to be used in the company’s COVID-19 Active Transmission (COAT) app. Additionally, I worked in a small group with four other interns to create an executive summary of select datasets that COAT app developers could use to enhance the app. Finally, I spent the last three weeks working on a data-driven project related to COVID-19 wherein my team built network graphs in Zeppelin to analyze contact-tracing data and study how the virus spread in South Korea and India.
Reflecting on the experience, I am amazed at how much I learned throughout the summer. The internship program improved my data analysis and interpretation skills, which are crucial to career paths I am considering: data science and research. Curation and profiling of datasets as well as working on the data-driven project gave me experience with designing insightful visuals using SQL and Tableau, which is a desirable skill for aspiring data scientists and researchers. Additionally, coding experience in Scala gave me a chance to practice algorithmic thinking, and knowledge of the language will be particularly helpful if I pursue a career in data science.
Even though I learned a lot, my favorite part remains the conversations I had with fellow interns and full-time staff and the connections I built. When I first learned that the internship would be remote, I was upset by the immediate thought that came to mind: I imagined myself silently typing away at my keyboard eight hours a day, five days a week. However, the experience turned out to be completely opposite to what I had anticipated. I attended an average of three hours of meetings each day for group projects, team bonding events, etc., which led me to develop a strong sense of connection to my colleagues even though we met only virtually.
Overall, it was a wonderful learning experience and I hope to get another chance to work with the people I met sometime soon.
Visit the Summer 2020 Internships page to read more student stories.