Hi! I'm Ardy. I'm currently working on AI capabilities and risk propensity evaluation, specifically on building better tools and environments for eval. I obtained my MS in CS from the University of Southern California and my BS in Informatics from Institut Teknologi Bandung, Indonesia.
Previous to my masters, I worked as an AI Engineer for 3yrs at Bukalapak, one of Indonesia's biggest ecommerce companies. I worked on their agentic LLM chatbot, recommendation system, and fraud detection platform.
Awards
2nd Place · AI Manipulation Hackathon
Apart ResearchSolo team. Empirical evaluation of frontier LLMs' susceptibility to adversarial web content injection in product recommendation. Designed a benchmark across 10 product categories and quantified deception rates.
2nd Place · AI Safety Startup Hackathon
Apart ResearchSolo team. Proposed a startup to scale training and alignment of agentic AI in cyber-physical systems through domain-specific simulations and human feedback pipelines.
Fulbright Scholarship
U.S. Department of StateSponsored by the U.S. Department of State for M.S. in Computer Science at the University of Southern California.
Education
M.S. Computer Science · University of Southern California
Aug 2024 – Dec 2025GPA: 3.75. Fulbright Scholar. Coursework: Trustworthy Large Foundation Models, Advanced NLP, Probabilistic and Generative Models, Machine Learning.
B.S. Informatics · Institut Teknologi Bandung
Aug 2017 – Oct 2021GPA: 3.73. Thesis: Domain Adaptation in Aspect Based Sentiment Classification using BERT Model.
Work Experience
AI Research Engineer · Bukalapak
Jun 2021 – May 2024Built the company's first agentic LLM RAG chatbot for customer service. Owned 50+ data pipelines for recommendation systems. Achieved 30%+ infrastructure cost reduction. Developed fraud detection models reducing forbidden product takedown time by 50%.
AI Engineer · Supertype
Sep 2020 – Feb 2021Built a web app for Google Play product sentiment analysis. End-to-end: design, development, deployment using Python, Flask, and Altair.
Software Engineer Intern · Shopee
May – Aug 2020Developed mobile frontend for Shopee's gamification campaign in React, deployed across 10+ countries.
Research Projects
Social Media Narrative Monitoring Through Knowledge Graphs
Feb – Apr 2025SPAR Project. Under Kellin Pelrine at Mila / Complex Data Lab. Built a platform to extract narratives and public figure mentions from social media, converting them into a queryable knowledge graph for analysts.
Indonesia's AI Roadmap Risk Assessment
Jul – Sep 2025AISES course project. Developed an LLM-powered literature review pipeline to analyze the gap between Indonesia's National AI Roadmap and the MIT Risk Repository. Found strong coverage of cyber/misuse risks but near-zero mention of AI control and system safety.
Eliciting Ranking Bias and Deception on Generative Search Engines
2024 – 2025Evaluated LLM vulnerabilities to adversarial prompts in RAG platforms for product recommendation. Demonstrated that attacks transfer across models (GPT-4.1, GPT-5.1) and consumer platforms (ChatGPT). Extended to study deception in factual claims.
Compact CoT Exploration with Multi-token Forward Passes
2024 – 2025Proposed fused-token forward passes (FTFP), an alternative to next-token embedding using probability-weighted combination of top-k token embeddings. Replicated Meta's Coconut paper as a baseline decoding method.
Stock Movement Prediction from Social Media & Company Correlations
2024Reproducibility project. Lead contributor. Used graph attention networks with multimodal data (prices, tweets, inter-stock graphs). Improved results by adding residual connections and LayerNorm to combined embeddings.
Activation Engineering Library (Patchscope)
Jan 2024AI Safety Camp. Participated in designing a modular activation engineering library based on Google PAIR's Patchscope paper, providing APIs for logit lens and future lens interactions with model internals.
Measuring Unlearning in LMs Under Few-shot Learning
Jun – Aug 2022ML Safety Scholars. Evaluated unlearning capabilities in language models and uncovered inverse scaling phenomena. Measured training iterations needed to learn new word meanings while forgetting prior harmful information. Compared GPT-2 and InstructGPT.