Tang et al., 2023 - Google Patents

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

Tang et al., 2023

Document ID: 15649746882063966615
Author: Tang X; Liu Y; Cai Z; Shao Y; Lu J; Zhang Y; Deng Z; Hu H; An K; Huang R; Si S; Chen S; Zhao H; Chen L; Wang Y; Liu T; Jiang Z; Chang B; Fang Y; Qin Y; Zhou W; Zhao Y; Cohan A; Gerstein M
Publication year: 2023
Publication venue: arXiv preprint arXiv:2311.09835

External Links

Cited by

Snippet

Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function- level code generation, they struggle with repository-scale code understanding (eg, coming up with the right arguments for calling routines), requiring a deeper comprehension of …

Continue reading at arxiv.org (PDF) (other versions)

238000010801 machine learning 0 title description 2

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/34—Graphical or visual programming
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/36—Software reuse
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/76—Adapting program code to run in a different environment; Porting
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/38—Implementation of user interfaces
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/71—Version control; Configuration management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/10—Requirements analysis; Specification techniques
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/20—Software design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation, e.g. computer aided management of electronic mail or groupware; Time management, e.g. calendars, reminders, meetings or time accounting
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications

Similar Documents

Publication	Publication Date	Title
Du et al.	2025	DependEval: Benchmarking LLMs for Repository Dependency Understanding
US10353796B2 (en)	2019-07-16	System and method for using development objectives to guide implementation of source code
Arulmohan et al.	2023	Extracting domain models from textual requirements in the era of large language models
Iovino et al.	2012	On the Impact Significance of Metamodel Evolution in MDE.
Wilken	2018	Angular in action
Tang et al.	2023	ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
Oumoussa et al.	2024	Evolution of microservices identification in monolith decomposition: A systematic review
US20180060779A1 (en)	2018-03-01	Method of generating business process model and computerized system associated therewith
Ray	2025	A Review on Vibe Coding: Fundamentals, State-of-the-art, Challenges and Future Directions
Hemmat et al.	2025	Research directions for using LLM in software requirement engineering: A systematic review
Ramackers et al.	2021	From prose to prototype: synthesising executable UML models from natural language
Lutalo	2024	Software Language Engineering-Text Processing Language Design, Implementation, Evaluation Methods
Kim	2024	Comparing proficiency of ChatGPT and bard in software development
Capdepon et al.	2023	Migration Process from Monolithic to Micro Frontend Architecture in Mobile Applications.
Khan et al.	2020	Developing Multi-Platform Apps with Visual Studio Code
Sänger et al.	2023	Large language models to the rescue: Reducing the complexity in scientific workflow development using ChatGPT
CN116107524B (en)	2023-07-18	Low-code application log processing method, medium, device and computing equipment
Flores	2024	A Two-Level Model-Driven Engineering Approach for Reengineering CI/CD Pipelines
Pedemonte et al.	2012	Towards automatic functional test execution
Miao et al.	2025	Paper2agent: Reimagining research papers as interactive and reliable ai agents
Zhou	2024	Fine-Tuning Large Language Models for Practical Software Engineering: Case Studies in Automated Patch Generation
Ignaim	2021	EvoSPL: An evolutionary approach for adopting software product lines in the automotive industry
Aigner et al.	2024	Kotlin in action
Tang et al.	0	ML-Bench: Evaluating Large Language Models for Code Generation in Repository-Level Machine Learning Tasks
EP4629056A1 (en)	2025-10-08	Video analytics pipeline development system with assistive feedback and annotation