Xiong et al., 2023 - Google Patents

Hext5: Unified pre-training for stripped binary code information inference

Xiong et al., 2023

Document ID: 988185210181803975
Author: Xiong J; Chen G; Chen K; Gao H; Cheng S; Zhang W
Publication year: 2023
Publication venue: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)

External Links

Cited by

Snippet

Decompilation is a widely used process for reverse engineers to significantly enhance code readability by lifting assembly code to a higher-level C-like language, pseudo-code. Nevertheless, the process of compilation and stripping irreversibly discards high-level …

Continue reading at ieeexplore.ieee.org (other versions)

238000012549 training 0 title abstract description 46

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/51—Source to source
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3612—Software analysis for verifying properties of programs by runtime analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/76—Adapting program code to run in a different environment; Porting
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Jin et al.	2022	Symlm: Predicting function names in stripped binaries via context-sensitive execution-aware code embeddings
Ashizawa et al.	2021	Eth2vec: learning contract-wide code representations for vulnerability detection on ethereum smart contracts
Shen et al.	2020	A survey of automatic software vulnerability detection, program repair, and defect prediction techniques
Al-Kaswan et al.	2023	Extending source code pre-trained language models to summarise decompiled binaries
Xiong et al.	2023	Hext5: Unified pre-training for stripped binary code information inference
Rahimian et al.	2015	Bincomp: A stratified approach to compiler provenance attribution
Jin et al.	2023	Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models
Tian et al.	2021	BinDeep: A deep learning approach to binary code similarity detection
Qiu et al.	2024	Vulnerability detection via multiple-graph-based code representation
Zhu et al.	2025	Callee: Recovering call graphs for binaries with transfer and contrastive learning
Zhao et al.	2019	Suzzer: A vulnerability-guided fuzzer based on deep learning
Artuso et al.	2024	Binbert: Binary code understanding with a fine-tunable and execution-aware transformer
Xue et al.	2019	Hecate: Automated customization of program and communication features to reduce attack surfaces
Mastropaolo et al.	2024	Toward automatically completing GitHub workflows
Liu et al.	2023	An empirical study of smart contract decompilers
Liu et al.	2022	Autoupdate: Automatically recommend code updates for android apps
Huang et al.	2023	BCGen: a comment generation method for bytecode
Mammadov et al.	2024	Learning program behavioral models from synthesized input-output pairs
Khoo	2013	Decompilation as search
Lyu et al.	2021	Sparrowhawk: Memory safety flaw detection via data-driven source code annotation
Xia et al.	2023	Binary code similarity analysis based on naming function and common vector space
Artuso	2025	Deep learning based binary code analysis
Li et al.	2019	Adabot: Fault-tolerant java decompiler
Zhang et al.	2025	BinQuery: A Novel Framework for Natural Language-Based Binary Code Retrieval
Pakshad et al.	2025	Are Textual Prompts in Large Language Models Sufficient for Vulnerability Detection?