[go: up one dir, main page]

Tang et al., 2023 - Google Patents

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

Tang et al., 2023

View PDF
Document ID
15649746882063966615
Author
Tang X
Liu Y
Cai Z
Shao Y
Lu J
Zhang Y
Deng Z
Hu H
An K
Huang R
Si S
Chen S
Zhao H
Chen L
Wang Y
Liu T
Jiang Z
Chang B
Fang Y
Qin Y
Zhou W
Zhao Y
Cohan A
Gerstein M
Publication year
Publication venue
arXiv preprint arXiv:2311.09835

External Links

Snippet

Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function- level code generation, they struggle with repository-scale code understanding (eg, coming up with the right arguments for calling routines), requiring a deeper comprehension of …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/34Graphical or visual programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/36Software reuse
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/44Arrangements for executing specific programmes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/76Adapting program code to run in a different environment; Porting
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/38Implementation of user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/10Requirements analysis; Specification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/02Knowledge representation
    • G06N5/022Knowledge engineering, knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation, e.g. computer aided management of electronic mail or groupware; Time management, e.g. calendars, reminders, meetings or time accounting
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications

Similar Documents

Publication Publication Date Title
Du et al. DependEval: Benchmarking LLMs for Repository Dependency Understanding
US10353796B2 (en) System and method for using development objectives to guide implementation of source code
Arulmohan et al. Extracting domain models from textual requirements in the era of large language models
Iovino et al. On the Impact Significance of Metamodel Evolution in MDE.
Wilken Angular in action
Tang et al. ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
Oumoussa et al. Evolution of microservices identification in monolith decomposition: A systematic review
US20180060779A1 (en) Method of generating business process model and computerized system associated therewith
Ray A Review on Vibe Coding: Fundamentals, State-of-the-art, Challenges and Future Directions
Hemmat et al. Research directions for using LLM in software requirement engineering: A systematic review
Ramackers et al. From prose to prototype: synthesising executable UML models from natural language
Lutalo Software Language Engineering-Text Processing Language Design, Implementation, Evaluation Methods
Kim Comparing proficiency of ChatGPT and bard in software development
Capdepon et al. Migration Process from Monolithic to Micro Frontend Architecture in Mobile Applications.
Khan et al. Developing Multi-Platform Apps with Visual Studio Code
Sänger et al. Large language models to the rescue: Reducing the complexity in scientific workflow development using ChatGPT
CN116107524B (en) Low-code application log processing method, medium, device and computing equipment
Flores A Two-Level Model-Driven Engineering Approach for Reengineering CI/CD Pipelines
Pedemonte et al. Towards automatic functional test execution
Miao et al. Paper2agent: Reimagining research papers as interactive and reliable ai agents
Zhou Fine-Tuning Large Language Models for Practical Software Engineering: Case Studies in Automated Patch Generation
Ignaim EvoSPL: An evolutionary approach for adopting software product lines in the automotive industry
Aigner et al. Kotlin in action
Tang et al. ML-Bench: Evaluating Large Language Models for Code Generation in Repository-Level Machine Learning Tasks
EP4629056A1 (en) Video analytics pipeline development system with assistive feedback and annotation