Opcode Summary
| Property | Value |
|---|---|
| Opcode | 0x35 |
| Mnemonic | CALLDATALOAD |
| Gas | 3 |
| Stack Input | idx |
| Stack Output | msg.data[idx:idx+32] |
| Behavior | Reads 32 bytes of transaction calldata starting at byte offset idx. If idx + 32 > calldatasize, the out-of-bounds bytes are zero-padded on the right. Never reverts. |
Threat Surface
CALLDATALOAD is the primary mechanism by which contracts read function arguments from transaction input. Its security significance arises from a deceptively simple property: out-of-bounds reads never revert — they silently return zero-padded data. When idx + 32 exceeds calldatasize, the missing bytes are filled with 0x00, and execution continues as if valid data was provided.
This zero-padding behavior is by design (it mirrors memory read semantics), but it creates a broad threat surface:
- Short calldata attacks: If a caller submits fewer bytes than the ABI specification requires, CALLDATALOAD silently pads with zeros. This can shift parameter boundaries, effectively multiplying amounts by powers of 256 (the ERC-20 short address attack) or zeroing out trailing arguments.
- ABI encoding/decoding bugs: The EVM itself doesn’t enforce ABI encoding correctness. CALLDATALOAD reads raw bytes at offsets the compiler trusts to be well-formed. Malformed, non-canonical, or “dirty” calldata can cause the contract to decode garbage as valid parameters.
- Calldata injection via dynamic types: ABI-encoded dynamic types (bytes, string, arrays) use offset pointers that CALLDATALOAD reads. A crafted calldata payload can set these offsets to point anywhere within (or beyond) the calldata buffer, causing the contract to read attacker-controlled data from unexpected locations.
- Reading past calldata boundary: Any CALLDATALOAD at an offset beyond
calldatasizereturns a full 32 bytes of zeros. Contracts that use this value without checking calldatasize may operate on phantom zero-values with real consequences.
Smart Contract Threats
T1: Short Calldata Zero-Padding — ERC-20 Short Address Attack (Critical)
When transaction calldata is shorter than expected by the ABI specification, CALLDATALOAD pads the missing bytes with zeros. For ERC-20 transfer(address,uint256) calls, the standard encoding is:
4 bytes: function selector (0xa9059cbb)
32 bytes: address (left-padded to 32 bytes, last 20 bytes are the address)
32 bytes: uint256 amount
If the address parameter is submitted as 19 bytes instead of 20 (missing one trailing zero byte), the ABI encoding shifts: the last byte of the address bleeds into the amount field, and the amount’s last byte falls off the end of the calldata. CALLDATALOAD pads that missing byte with 0x00, effectively left-shifting the amount by 8 bits — multiplying it by 256.
Normal: a9059cbb | 000...abcdef1234567890 | 000...0064 (100 tokens)
Short: a9059cbb | 000...abcdef12345678 | 90000...006400 (25,600 tokens)
^^ byte shifted from address
^^ zero-padded by EVM
The EVM does not validate calldata length against the function signature. CALLDATALOAD happily reads whatever is there and pads the rest. The vulnerability sits in the off-chain application (exchange, wallet) that constructs the short calldata, but the EVM’s zero-padding behavior is what makes the exploit mechanically possible.
T2: ABI Decoding Vulnerabilities — Non-Canonical Calldata (High)
The ABI encoding specification is a convention, not an EVM-enforced rule. CALLDATALOAD reads raw 32-byte words at offsets computed by the compiler. If a caller submits non-canonical calldata (extra bytes, misaligned offsets, duplicate data), the contract may decode it differently than intended:
- Dirty higher-order bits: Address parameters should be zero-padded in the upper 12 bytes. If they aren’t, CALLDATALOAD reads all 32 bytes, and the Solidity compiler masks to 20 bytes with
AND(value, 0xFFFF...FFFF). But contracts using inline assembly may skip this masking, treating the dirty bits as part of the address. - Overlapping offset pointers: Dynamic types use offset pointers (read via CALLDATALOAD) that point to where the actual data begins. Nothing prevents two dynamic parameters from pointing to the same data, or to the offset field of another parameter, creating aliasing bugs.
- ABI re-encoding bugs in Solidity: The compiler itself has had bugs (CVE-2022-08-08, fixed in 0.8.16) where ABI re-encoding of calldata tuples containing mixed static/dynamic types could overflow, corrupting the encoding of dynamic components.
T3: Calldata Injection via Dynamic Type Offsets (Critical)
Dynamic ABI types (bytes, string, uint256[]) are encoded as a head (offset pointer) followed by a tail (length + data). CALLDATALOAD reads the offset pointer from the head section, then reads the actual data from the pointed-to tail location. An attacker who can influence the offset pointer can redirect the read to any location in the calldata:
Standard ABI layout:
[0x00] selector
[0x04] offset to bytes param → points to 0x40
[0x24] uint256 param
[0x44] length of bytes
[0x64] bytes data...
Malicious layout:
[0x04] offset to bytes param → points to 0x24 (the uint256 param!)
[0x24] uint256 param (now re-read as bytes length)
The 1inch Fusion V1 exploit ($5M, March 2025) used this class of attack: crafted calldata with a negative interactionLength value caused an integer underflow in the offset calculation, allowing the attacker to overwrite the resolver address in memory and redirect settlement funds to their own address.
T4: Reading Past Calldata Boundary Returns Zeros (Medium)
Any CALLDATALOAD at offset >= calldatasize returns 32 bytes of zeros. Contracts that read optional parameters or variable-length data without first checking calldatasize may silently operate on zero values:
// If msg.data is only 4 bytes (just the selector), CALLDATALOAD(4) returns 0
function withdraw(uint256 amount) external {
// amount = 0 if calldata is short -- no revert, just a zero-amount withdrawal
_withdraw(msg.sender, amount);
}While a zero-amount operation is usually benign, in contracts where zero has special meaning (e.g., “withdraw all”, “use default”, or “skip validation”), it becomes exploitable.
T5: Function Selector Extraction at Offset 0 (Low)
CALLDATALOAD at offset 0 reads the first 32 bytes of calldata, which includes the 4-byte function selector in the highest-order bytes. Contracts that manually parse the selector (common in assembly-optimized routers, proxies, and diamond patterns) must right-shift by 224 bits or mask correctly. Off-by-one errors in the shift or mask can cause selector collisions or misrouted calls:
assembly {
let selector := shr(224, calldataload(0)) // Correct: extracts top 4 bytes
// Bug: shr(256, calldataload(0)) == 0 always
// Bug: calldataload(0) without shift includes argument bytes
}Protocol-Level Threats
P1: No DoS Vector (Low)
CALLDATALOAD costs a fixed 3 gas regardless of the offset value. Reading at offset 2^256 - 1 costs the same as reading at offset 0. It does not expand memory, allocate resources, or perform I/O. It cannot be used for gas griefing or computational DoS.
P2: Consensus Safety (Low)
CALLDATALOAD is trivially deterministic: it reads from the transaction’s immutable calldata and pads with zeros for out-of-bounds reads. All EVM client implementations agree on this behavior. No known consensus divergence has occurred due to CALLDATALOAD.
P3: Calldata Immutability Assumption (Low)
Calldata is read-only within a transaction — CALLDATALOAD can never return different values for the same offset within the same call context. This is a security positive: unlike memory, calldata cannot be corrupted mid-execution. However, contracts using delegatecall inherit the caller’s calldata, which means CALLDATALOAD in the delegate reads the outer context’s input, not the inner call’s. This is by design, but mis-understanding it can lead to parameter confusion in proxy patterns.
P4: Compiler Trust in Calldata Layout (Medium)
The Solidity compiler generates CALLDATALOAD instructions at fixed offsets, trusting that calldata conforms to ABI encoding. The EVM does not enforce this trust. Any transaction can contain arbitrary bytes as calldata, and CALLDATALOAD will read them without validation. This gap between the compiler’s assumptions and the EVM’s permissiveness is the root cause of all calldata injection vulnerabilities.
Edge Cases
| Edge Case | Behavior | Security Implication |
|---|---|---|
| Offset = 0 | Returns first 32 bytes (function selector + first 28 bytes of args) | Used for manual selector parsing; shift errors cause misrouting |
| Offset = 4 | Returns first argument (standard ABI) | Normal parameter read |
| Offset > calldatasize | Returns 32 bytes of zeros | Silent zero-value; no revert |
| Partial overlap (offset + 32 > calldatasize) | Reads available bytes, zero-pads the rest on the right | Short calldata attack vector; partial arguments appear valid |
| Very large offset (e.g., 2^255) | Returns 32 bytes of zeros | Same as any out-of-bounds; costs 3 gas |
| Offset = calldatasize - 1 | Returns 1 real byte + 31 zero bytes | Edge of valid/invalid boundary |
| Empty calldata (calldatasize = 0) | Any offset returns 32 zero bytes | Fallback/receive context; all CALLDATALOAD reads are zero |
| Offset = calldatasize | Returns 32 zero bytes | First fully out-of-bounds read |
Real-World Exploits
Exploit 1: ERC-20 Short Address Attack — Golem/Poloniex (April 2017)
Root cause: Off-chain application (Poloniex exchange) failed to validate that user-supplied addresses were exactly 20 bytes before constructing ABI-encoded calldata for ERC-20 transfer().
Details: A Golem (GNT) token holder discovered that Poloniex did not validate the length of withdrawal addresses. An attacker could register an address whose last byte was 0x00, then submit it as a 19-byte address (omitting the trailing zero). The exchange’s backend would encode the transfer calldata with a 19-byte address, causing the ABI encoding to shift: the uint256 amount field lost its last byte, which CALLDATALOAD zero-padded at the end. This effectively multiplied the transfer amount by 256.
For example, a withdrawal of 100 tokens (0x64) would become 25,600 tokens (0x6400) because the amount’s last byte was pushed past the calldata boundary and CALLDATALOAD padded it with 0x00, left-shifting the value by one byte.
CALLDATALOAD’s role: The EVM’s CALLDATALOAD instruction read the shifted amount field and zero-padded the missing byte, producing a value 256x larger than intended. The EVM does not enforce minimum calldata length or ABI conformance.
Impact: Vulnerability discovered before active exploitation. Could have drained all GNT (and any ERC-20 token) held by Poloniex. The attack class affected every exchange and wallet that did not validate input length.
References:
- Eric Rafaloff: Analyzing the ERC20 Short Address Attack
- Golem: How to Find $10M Just by Reading the Blockchain
- CoinDesk: Exchange Bug Discovery Averts Ethereum Token Theft
Exploit 2: 1inch Fusion V1 Calldata Corruption — $5M Stolen (March 2025)
Root cause: Unsafe calldata reconstruction in Yul assembly code that read an attacker-controlled interactionLength value via CALLDATALOAD without validation, triggering an integer underflow in offset calculations.
Details: The 1inch Fusion V1 Settlement contract’s _settleOrder() function used Yul assembly to parse serialized order data from calldata. The function read an interactionLength parameter from calldata using CALLDATALOAD and used it to compute the memory offset where the order suffix (containing the resolver address) would be written. The attacker supplied a negative interactionLength value (0xFFFF...FE00 = -512 in two’s complement), causing the offset calculation to underflow. This wrote the order suffix to an unintended memory location, overwriting the resolver address with the attacker’s own address.
The attacker crafted a nested chain of 6 orders with carefully calculated ABI offsets, exploiting the 544 bytes of zero-padding that standard ABI encoding places between order data structures. When resolveOrders() was called, the corrupted resolver address directed settlement funds to the attacker.
CALLDATALOAD’s role: CALLDATALOAD read the malicious interactionLength value from attacker-controlled calldata at a specific offset. The instruction returned the raw bytes without any validation — negative values, oversized values, and underflow-triggering values are all valid 32-byte words to CALLDATALOAD.
Impact: $5M drained from TrustedVolumes, a market-maker resolver contract. Most funds were later returned via negotiation. The vulnerability survived multiple security audits because it required assembly-level understanding of calldata parsing and memory layout.
References:
- Coinspect: 1inch Calldata Corruption (Learn EVM Attacks)
- BlockSec: 1inch Incident - From Calldata Corruption to Forged Settlement
- Decurity: Yul Calldata Corruption — 1inch Postmortem
Exploit 3: Solidity Head Overflow in Calldata Tuple ABI-Reencoding (August 2022)
Root cause: Solidity compiler bug (versions 0.5.8 through 0.8.15) in ABI re-encoding of calldata tuples. When re-encoding a tuple containing both dynamic and static components, an overly aggressive cleanup mstore for fixed-size arrays corrupted 32 bytes of the adjacent dynamic component’s encoding.
Details: When a function received a calldata tuple containing a dynamic type (e.g., bytes) followed by a statically-sized array of uint256 or bytes32, the compiler-generated re-encoding code used mstore (which always writes 32 bytes) to zero-pad the static array’s unused space. This write overflowed into the first 32 bytes of the dynamic component, corrupting its encoded length or data.
An attacker could craft calldata where the overflow corrupted the length field of a dynamic bytes parameter, causing the contract to decode an attacker-controlled number of bytes as the parameter value. This enabled reading past the intended parameter boundary into adjacent or subsequent calldata fields.
CALLDATALOAD’s role: The vulnerability chain started with CALLDATALOAD reading the tuple’s components from calldata. While the read itself was correct, the subsequent memory copy and re-encoding operated on assumptions about component layout that broke under the overflow. The corrupted re-encoding could then be passed to internal functions that re-read from memory, producing different values than what CALLDATALOAD originally delivered.
Impact: Affected all contracts compiled with Solidity 0.5.8–0.8.15 that re-encoded calldata tuples with mixed dynamic/static types. No known active exploitation, but the bug class could enable parameter confusion, length corruption, and data exfiltration from adjacent calldata fields.
References:
- Solidity Blog: Head Overflow Bug in Calldata Tuple ABI-Reencoding
- Eocene: Solidity Compiler Vulnerability Analysis - Head Overflow in ABIv2-Reencoding
Exploit 4: Solidity ABI Decoder Bug — Multi-Dimensional Memory Arrays (April 2021)
Root cause: Solidity ABI coder v2 bug where abi.decode() of specially crafted calldata for multi-dimensional memory arrays could read values from memory outside the intended data area.
Details: When decoding ABI-encoded data containing nested dynamic arrays, the Solidity compiler generated code that used CALLDATALOAD to read offset pointers, then copied data to memory. A crafted payload with overlapping or out-of-bounds offset pointers caused the decoder to read memory outside the allocated decoding buffer. This could expose unrelated memory contents (internal state, other variables) as if they were part of the decoded array.
A related issue (Solidity #7114) showed that abi.decode() could produce results where two decoded arrays shared the same memory location, meaning modification of one array silently modified the other.
CALLDATALOAD’s role: CALLDATALOAD read the malicious offset pointers from calldata without bounds checking. The EVM returned whatever 32-byte value was at the requested offset, and the compiler’s decoder trusted these values to point within the calldata buffer.
Impact: Fixed in Solidity 0.8.4. Contracts using abi.decode() on untrusted calldata with nested dynamic arrays were vulnerable to memory disclosure and aliasing.
References:
- Solidity Blog: Decoding from Memory Bug
- Solidity Issue #7114: abi.decode() results can occupy the same memory location
Attack Scenarios
Scenario A: ERC-20 Short Address Token Theft
// Standard ERC-20 transfer function
contract VulnerableToken {
mapping(address => uint256) public balances;
function transfer(address to, uint256 amount) external returns (bool) {
// 'to' and 'amount' are decoded from calldata by the compiler:
// to = CALLDATALOAD(4) -- right-masked to 20 bytes
// amount = CALLDATALOAD(36)
//
// If calldata is 67 bytes instead of 68:
// to = 0x...abcdef12345678 (19-byte address, shifted)
// amount = CALLDATALOAD(36) reads 31 real bytes + 1 zero-padded byte
// = original_amount * 256
require(balances[msg.sender] >= amount);
balances[msg.sender] -= amount;
balances[to] += amount;
return true;
}
}
// Attack: Exchange constructs calldata with 19-byte address
// Victim requests withdrawal of 100 tokens → exchange sends 25,600 tokensScenario B: Calldata Injection via Dynamic Type Offset Manipulation
contract VulnerableRouter {
function execute(bytes calldata data, address resolver) external {
// Compiler trusts ABI encoding:
// data offset = CALLDATALOAD(4)
// resolver = CALLDATALOAD(36)
//
// Attacker crafts calldata where 'data' offset points to resolver's slot:
// CALLDATALOAD(4) = 0x24 (points to offset 36, where resolver lives)
// The contract reads resolver's address bytes as the length of 'data'
// Subsequent reads pull from attacker-controlled regions
_executeWithResolver(data, resolver);
}
}Scenario C: Zero-Padded Optional Parameter Exploitation
contract VulnerableVault {
function withdraw(uint256 amount, uint256 minOut) external {
// If caller sends only 36 bytes (selector + amount, no minOut):
// minOut = CALLDATALOAD(36) = 0 (zero-padded, no revert)
// Slippage protection is silently disabled!
uint256 received = _swap(amount);
require(received >= minOut); // 0 >= 0, always passes
_send(msg.sender, received);
}
}
// Attack: Call withdraw(1000) with only 36 bytes of calldata
// minOut defaults to 0, bypassing slippage protection during a sandwich attackScenario D: Assembly Calldata Parsing with Offset Error
contract VulnerableProxy {
fallback() external payable {
assembly {
// Read target address from calldata
let target := calldataload(0)
// Bug: offset 0 reads selector + first 28 bytes of first arg
// The developer intended calldataload(4) to skip the selector,
// or calldataload(0) with shr(96, ...) to extract an address
// With this bug, the "target" includes the selector bytes,
// causing calls to be routed to unintended addresses
calldatacopy(0, 4, sub(calldatasize(), 4))
let result := delegatecall(gas(), target, 0, sub(calldatasize(), 4), 0, 0)
returndatacopy(0, 0, returndatasize())
switch result
case 0 { revert(0, returndatasize()) }
default { return(0, returndatasize()) }
}
}
}Mitigations
| Threat | Mitigation | Implementation |
|---|---|---|
| T1: Short calldata / short address | Validate calldata length matches expected ABI encoding | require(msg.data.length >= 68) for 2-param functions; off-chain: validate address length before encoding |
| T1: Off-chain encoding | Use well-tested ABI encoding libraries | web3.js, ethers.js, and viem all validate parameter sizes before encoding |
| T2: Non-canonical calldata | Use Solidity’s type system; avoid raw assembly calldata parsing | Let the compiler handle ABI decoding; it masks dirty bits on addresses |
| T2: Dirty higher-order bits | Always mask address values in assembly | and(calldataload(offset), 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF) |
| T3: Calldata injection | Validate offset pointers and lengths from calldata | require(offset < calldatasize()) and bounds-check all dynamic type lengths |
| T3: Assembly calldata parsing | Prefer Solidity’s abi.decode() over manual parsing | The compiler generates bounds checks (post-0.8.4) for standard ABI decoding |
| T4: Zero-padded out-of-bounds reads | Check msg.data.length before reading optional parameters | require(msg.data.length >= MIN_EXPECTED_LENGTH) |
| T5: Selector parsing errors | Use Solidity’s function dispatch (automatic) or well-tested router patterns | For assembly: shr(224, calldataload(0)) extracts the 4-byte selector correctly |
| General | Keep Solidity compiler up to date | >= 0.8.16 fixes known ABI re-encoding bugs; >= 0.8.4 fixes decoder memory bugs |
Compiler/EIP-Based Protections
- Solidity >= 0.8.0: The compiler generates calldata length checks for external functions. If calldata is shorter than the minimum required for the function’s parameters, the transaction reverts. This mitigates T1 at the contract level (but not at the off-chain encoding level).
- Solidity >= 0.8.4: Fixes multi-dimensional array decoding bug. Adds additional bounds checks for nested dynamic type decoding.
- Solidity >= 0.8.16: Fixes the head overflow bug in calldata tuple ABI re-encoding. Prevents static array cleanup from corrupting adjacent dynamic components.
- EIP-3860 (Shanghai): Limits initcode size, reducing the attack surface for deployment-time calldata manipulation.
Severity Summary
| Threat ID | Category | Severity | Likelihood | Real-World Precedent |
|---|---|---|---|---|
| T1 | Smart Contract | Critical | Medium | Golem/Poloniex short address (2017), class of exchange bugs |
| T2 | Smart Contract | High | Medium | Solidity ABI re-encoding bugs (2021, 2022) |
| T3 | Smart Contract | Critical | Medium | 1inch Fusion V1 ($5M, 2025) |
| T4 | Smart Contract | Medium | Medium | Theoretical; slippage bypass patterns |
| T5 | Smart Contract | Low | Low | Assembly router bugs |
| P1 | Protocol | Low | N/A | — |
| P2 | Protocol | Low | N/A | — |
| P3 | Protocol | Low | Low | Delegatecall calldata inheritance confusion |
| P4 | Protocol | Medium | Medium | Root cause of all calldata injection CVEs |
Related Opcodes
| Opcode | Relationship |
|---|---|
| CALLDATASIZE (0x36) | Returns the length of calldata. Essential for bounds-checking before CALLDATALOAD. Mitigates T1 and T4 when used to validate minimum calldata length. |
| CALLDATACOPY (0x37) | Copies calldata to memory. Same zero-padding behavior for out-of-bounds reads. Used for bulk calldata parsing (e.g., abi.decode); shares the same threat surface as CALLDATALOAD. |
| CODECOPY (0x39) | Copies contract bytecode to memory. Similar zero-padding for out-of-bounds. Analogous read primitive but for code instead of calldata. |
| RETURNDATALOAD (0xF4) | Reads from return data buffer. Unlike CALLDATALOAD, accessing out-of-bounds return data reverts (post-Byzantium), providing a safer read semantic. The difference in error handling is a frequent source of confusion. |
| MLOAD (0x51) | Reads from memory. Memory is also zero-initialized, so MLOAD shares the silent-zero-read property. But memory is mutable, while calldata is immutable within a call context. |