DataLynn
New York, NY
Data Scientist Intern
Jul 2024 - Sep 2024
Leveraged GPT-4 via OpenAI API to process dialogue from online interview videos, extracting key information such as skills and project highlights; conducted comprehensive data cleaning and preprocessing, utilizing Git for version control, leading to a 20% improvement in candidate matching efficiency
Fine-tuned prompt generation for dataset creation, employing ThreadPoolExecutor in Python to parallelize extensive question prompt lists, optimizing resource allocation and reducing computation time by 15%.
Applied SentenceTransformer to encode textual data, developing similarity comparison using cosine similarity to assess response consistency across multiple instances, providing more reliable evaluations of candidate responses
Conducted feature engineering by generating sentiment scores and keyword frequencies, enhancing the response datasets and improving the accuracy of insights derived from large language models (LLMs)
Mountain View, CA
Data Analyst Intern
Aug 2023 - Nov 2023
Designed and optimized a database schema to store metadata and labels, utilizing SQL to extract insights on popular movie categories and Python to analyze trend and seasonality patterns.
Conducted extensive research on recommender system algorithms, focusing on collaborative filtering techniques and addressing similarity matching challenges for a large dataset of over 45,000 movie records.
Built a classification model for movie reviews by implementing natural language processing (NLP) techniques, such as feature extraction, and leveraging TensorFlow for ranking and retrieval to boost model precision.
Enhanced Word2Vec model efficiency by refining semantic information, reducing data volume, and improving performance through synonym association and vocabulary prediction.