Back to blog
Classification & Methodology

How to Clean Crypto Datasets Before Filing: Tax Data Engineering Guide 2025

Professional guide to clean, debug, and prepare your cryptocurrency data before filing your tax return. Learn data engineering techniques applied to crypto taxation so you can obtain accurate reports for the Spanish Tax Agency (AEAT).

E

Cleriontax Team

Crypto Tax and Data Analysis Experts

10 min read
Data EngineeringData CleaningData CleaningCSVNormalizationDuplicatesFIFOAEATIncome Tax ReturnForm 100ETLDatasetsConsolidationTraceability
Cómo limpiar datasets de criptomonedas antes de declarar - Guía de ingeniería de datos fiscal para preparar CSV y calcular correctamente FIFO ante la AEAT
7 de diciembre de 2025
10 min de lectura
Clasificación y Metodología
Data EngineeringData CleaningData CleaningCSVNormalizationDuplicatesFIFOAEATIncome Tax ReturnForm 100ETLDatasetsConsolidationTraceability

The difference between a correct cryptocurrency tax return and one that causes problems with the Tax Authority does not lie only in knowing the regulations. In practice, the real challenge is the quality of the data you use to calculate your capital gains and losses. After exporting your history from different exchanges and wallets, you end up with a set of CSV files that, in their raw state, contain inconsistencies, duplicates, empty fields, and incompatible formats that will make it impossible to correctly calculate the mandatory FIFO method in Spain.

At Cleriontax we apply data engineering methodologies to every tax report we prepare, because we know that a dirty dataset produces wrong results regardless of how good the calculation tool is. This article will teach you how to prepare your data the way we do, applying professional cleaning and normalization techniques that ensure the accuracy of your return before the AEAT.

Why data cleaning is critical for your tax return

When you export your transaction history from Binance, Coinbase, Kraken or any exchange, you get CSV files that were designed for internal use by each platform, not for tax compliance. Each exchange uses its own format, its own names for transaction types, and its own criteria for recording trades. If you also have activity in wallets like Metamask and have exported data from blockchain explorers, the heterogeneity of formats multiplies exponentially.

The fundamental problem is that the AEAT requires precise, chronologically ordered data expressed in euros to correctly apply the FIFO method. A single duplicate in your data can completely distort the acquisition cost of your cryptocurrencies, generating incorrect capital gains or losses.

Most frequent errors in raw datasets

Error typeTax consequenceFrequency
Duplicate transactionsDouble counting, incorrect FIFOVery high
Dates in incompatible formatsWrong tax year, incorrect orderingHigh
Misclassified internal operationsFictitious capital gainsHigh
Values not converted to EURCalculations in mixed currenciesMedium
Empty or null fieldsErrors in formulas and calculationsMedium
Different names for the same conceptIncorrect tax classificationHigh

Each of these issues, if not corrected before calculating capital gains, can result in an incorrect tax return with the corresponding penalties from the Tax Authority.

Phase 1: Inventory and initial audit of data sources

Before modifying any file, it is essential to create a complete inventory of all the data sources you need to consolidate. This initial audit phase determines the scope of the cleaning work and allows you to identify potential information gaps.

Start by creating an exhaustive list of all exchanges where you have traded during the tax year, including platforms you no longer use but where you carried out operations in the period to be reported. For each exchange, identify which types of export you have generated by following the specific guides for each platform. If you have not completed the export from any exchange, we recommend checking our guide to correctly export your Binance history as a methodological reference.

Source inventory checklist

Centralized exchanges:

  • Identify all platforms used (active and inactive)
  • Verify that all types of history have been exported (trades, staking, earn, conversions)
  • Document the date range of each export
  • Note the format of each file (CSV, XLSX, JSON)

Non-custodial wallets:

  • List all wallet addresses used
  • Identify the blockchains where each wallet has activity
  • Verify exports from Etherscan, Polygonscan, etc.
  • Include both transactions and token transfers

Recommended folder structure:

📁 Tax_Data_2024/
├── 📁 01_Original_Intact/
│   ├── 📁 Binance/
│   ├── 📁 Coinbase/
│   ├── 📁 Metamask_ETH/
│   └── 📁 Metamask_Polygon/
├── 📁 02_In_Progress/
└── 📁 03_Final_Consolidated/

Always keep an untouched copy of the exported files before making any changes. This is essential both for internal audits and for potential AEAT inquiries where you may need to demonstrate the traceability of your calculations.

Phase 2: Normalizing date and time formats

Date formats are one of the most frequent and potentially serious problems in cryptocurrency datasets. Each exchange uses its own format, and the differences may seem subtle but have significant consequences for tax calculations.

Date formats by platform

PlatformOriginal formatExampleTime zone
BinanceYYYY-MM-DD HH:MM:SS2024-06-15 14:32:18UTC
CoinbaseISO 86012024-06-15T14:32:18ZUTC
KrakenUnix timestamp1718458338UTC
EtherscanUnix timestamp1718458338UTC
BitstampDD/MM/YYYY HH:MM15/06/2024 14:32CET

When these files are combined without normalization, chronological sorting can produce completely incorrect results because the software interprets the fields differently.

Date normalization protocol

The process must follow these steps in strict order:

  1. Identify the format of each source file and document it explicitly
  2. Convert all formats to ISO 8601 (YYYY-MM-DDTHH:MM:SS) because of its natural sortability
  3. Adjust time zone to Spain (CET/CEST depending on the date)
  4. Verify the conversion by comparing known dates with the original record

Critical time zone example:

A transaction on Etherscan: 2024-12-31 23:30:00 UTC

Converted to Spain (winter time): 2025-01-01 00:30:00 CET

Tax year: 2025, not 2024

This difference determines in which tax year the transaction must be reported, which can have significant implications if it is a material capital gain or loss.

Phase 3: Detecting and removing duplicates

Duplicate transactions are the most common and one of the most harmful errors in cryptocurrency datasets. They arise mainly from exports with overlapping date ranges and from operations that appear recorded in multiple ways in different export types from the same exchange.

To correctly detect duplicates, it is not enough to compare identical rows. You must implement an identification logic based on key fields that define the uniqueness of each transaction.

Key fields to identify duplicates

In centralized exchanges:

  • Exact date and time (down to seconds)
  • Trading pair (BTC/EUR, ETH/USDT, etc.)
  • Transaction type (buy, sell, trade)
  • Executed amount
  • Execution price

In blockchain transactions:

  • Transaction hash (txHash) → definitive unique identifier
  • If there is no hash: combination of block + from + to + value

Three-pass deduplication process

PassCriterionAction
FirstRows 100% identicalAutomatically delete
SecondKey fields match, secondary columns differDelete while keeping the row with more information
ThirdSimilar but not identical transactionsManual case-by-case review

Example of a subtle duplicate:

Row 1: 2024-06-15 14:32:18 | BTC/EUR | BUY | 0.05 | 62450.00 | fee: 0.0001
Row 2: 2024-06-15 14:32:18 | BTC/EUR | BUY | 0.05 | 62450.00 | fee: (empty)

Both represent the same transaction, but one has the fee and the other does not. Pass 2 should keep row 1.

Keep a record of all duplicates removed. This log serves as documentation of the cleaning process and allows you to verify that legitimate operations have not been deleted by mistake. If you work with analysis tools like those described in our guide to tools for tracking cryptocurrencies, many include duplicate detection features you can use for cross-checking.

Phase 4: Classifying and standardizing transaction types

Each exchange uses its own terminology to describe transaction types. What Binance calls "SPOT Trade" may appear as "Trade" on Coinbase, "Exchange" on Kraken, or simply "Buy" on other platforms. To correctly apply the tax treatment to each transaction, you need a standardized classification system.

Mapping table: original names → tax category

Original nameExchangeStandardized tax category
SPOT Trade, Trade, ExchangeVariousSwap / Capital gain
Buy, Purchase, CompraVariousAcquisition
Sell, VentaVariousDisposal / Capital gain
Convert, Conversion, SwapVariousSwap / Capital gain
Deposit, IngresoVariousInternal movement (verify origin)
Withdrawal, RetiroVariousInternal movement (verify destination)
Staking Reward, InterestVariousInvestment income
Airdrop, DistributionVariousIncome (value at market price)
Fee, Commission, GasVariousDeductible expense

Main tax categories

Correct classification of each transaction determines its tax treatment:

  • Capital gains/losses: Sales, swaps between cryptos, payments with crypto
  • Investment income: Staking, lending, farming, airdrops
  • Internal movements: Transfers between your own wallets (not taxable)
  • Deductible expenses: Commissions, gas fees, trading fees

A frequent error is to classify transfers between your own wallets as sales, which generates fictitious capital gains. To go deeper into the classification criteria according to the AEAT, we recommend reading our article on how to classify cryptocurrency transactions.

Phase 5: Converting values to euros

The AEAT requires that all transactions be reported in euros, valued at the exchange rate at the exact moment of each transaction. This requirement implies a conversion process that must be carried out accurately and with documented pricing sources.

Recommended pricing sources

Asset typePrimary sourceAlternative sourceGranularity
BTC, ETH, top 20CoinGecko APICoinMarketCapMinutes
Liquid altcoinsCoinGecko APIExchange where tradedHours
Illiquid tokensPrice on exchangeDEX (Uniswap, etc.)Transaction
USD stablecoinsEUR/USD exchange rateECB or OANDADaily

Conversion process for crypto-to-crypto transactions

For transactions carried out in crypto-crypto pairs, you need a two-step conversion:

Example: Swap ETH → LINK on Uniswap

1. Original transaction:
   - Sent: 2 ETH
   - Received: 150 LINK
   - Date: 2024-07-20 15:45:00 UTC

2. Step 1 – Value ETH in EUR:
   - ETH price on 2024-07-20 15:45: 3,200 €
   - Value sent: 2 × 3,200 = 6,400 €

3. Step 2 – Value LINK in EUR:
   - LINK price on 2024-07-20 15:45: 42.50 €
   - Value received: 150 × 42.50 = 6,375 €

4. Tax result:
   - Disposal value: 6,375 € (what you receive)
   - If ETH acquisition cost was 5,000 €:
   - Capital gain: 6,375 - 5,000 = 1,375 €

The process must document the source of each price used. This is especially important for high-value trades where a difference in pricing can mean thousands of euros difference in the calculated capital gain.

At Cleriontax we use our own validated historical price databases that allow us to apply accurate conversions even for illiquid tokens.

Phase 6: Handling empty fields and anomalous values

Empty fields and anomalous values in cryptocurrency datasets can have several legitimate causes, but they can also indicate export issues that require correction. Handling these cases must be done carefully so as not to delete valid information or propagate errors.

Interpreting empty fields

Empty fieldLikely interpretationAction
Fee / CommissionPromotion without fee or fee in another tokenVerify; if correct → 0
PriceSerious export errorCheck original history
AmountSerious export errorCheck original history
Transaction hashNormal in centralized exchangesAcceptable
Notes / MemoOptional field not usedIgnore

Detecting anomalous values

Implement automatic validations that flag suspicious values:

  • Implicit price ±20% vs market: Flag for manual review
  • Negative amounts: Sign error, correct
  • Dates out of range: Corrupted export, re-export
  • Fees higher than 10% of value: Verify (may be legitimate on congested networks)

Example of a legitimate anomaly:

Transaction: Swap 0.01 ETH → 50 USDC
Date: 2024-05-15 (network congestion due to memecoin)
Gas fee: 0.008 ETH (80% of the traded value)

→ It looks anomalous but is correct. Document and keep.

Each anomaly must be investigated individually to determine whether it represents an error that needs to be corrected or an atypical but legitimate transaction.

Phase 7: Consolidating multiple sources

Once each source file has been normalized individually, the next step is to consolidate all the data into a single dataset that represents all your cryptocurrency activity during the tax period.

Unified column schema

ColumnTypeMandatoryDescription
fecha_horadatetimeYesISO 8601, Spain time zone
tipo_operacionstringYesStandardized tax category
cripto_enviadastringDepending on typeSymbol of the asset sold/sent
cantidad_enviadadecimalDepending on typeAmount of the asset sent
cripto_recibidastringDepending on typeSymbol of the asset bought/received
cantidad_recibidadecimalDepending on typeAmount of the asset received
valor_eurdecimalYesTransaction value in EUR
fee_eurdecimalYesTotal fee in EUR
fuentestringYesSource exchange/wallet
hash_txstringNoBlockchain hash if applicable
notasstringNoComments on the process

Consolidation process

  1. Add an origin column to each record before combining
  2. Unify column names according to the standard schema
  3. Combine all files into a single dataset
  4. Sort chronologically by fecha_hora
  5. Run a final deduplication in case there are cross-recorded operations

If consolidation and FIFO calculation are complex for you, our portfolio analysis service includes the entire cleaning, normalization, and calculation process with professional verification at every step.

Phase 8: Cross-validation and integrity checks

Before using the clean dataset to calculate your tax return, it is essential to perform cross-validation to verify the integrity of the processed data.

Three mandatory checks

1. Control balance

Calculate the final balance of each cryptocurrency according to your dataset and compare it with the real balance:

Balance according to dataset BTC: 0.5423 BTC
Real balance in exchanges + wallets: 0.5420 BTC
Difference: 0.0003 BTC (0.05%)

→ Acceptable difference due to rounding

Differences greater than 1% require investigation.

2. Transaction count

Total original records: 2,847
Documented duplicates removed: 156
Records in final dataset: 2,691

2,847 - 156 = 2,691 ✓ Checks out

3. Temporal consistency

Check that there are no unexplained temporal gaps. If you traded actively all year but your dataset shows months with no activity, this may indicate incomplete exports.

This validation phase is where a professional tax filing service adds the most value, as experience allows us to identify error patterns that an occasional user might miss.

Tools to automate cleaning

For users with high transaction volumes, manual processing can be impractical. These tools help automate part of the process:

Tool comparison

ToolRecommended volumeTechnical levelCost
Excel / Google Sheets< 2,000 txBasicFree
Google Sheets + QUERY< 5,000 txIntermediateFree
Python + pandasUnlimitedAdvancedFree
Koinly / CoinTracking< 10,000 txBasicPaid

For Excel/Sheets:

  • REMOVE.DUPLICATES function for pass 1
  • Pivot tables for total verification
  • VLOOKUP to add prices
  • Conditional formatting to detect anomalies

For technical users (Python):

import pandas as pd

# Load and combine sources
df = pd.concat([binance_df, coinbase_df, etherscan_df])

# Normalize dates
df['fecha'] = pd.to_datetime(df['fecha'], utc=True)
df['fecha'] = df['fecha'].dt.tz_convert('Europe/Madrid')

# Remove duplicates by key fields
df = df.drop_duplicates(subset=['fecha', 'par', 'tipo', 'cantidad'])

# Sort chronologically
df = df.sort_values('fecha')

Crypto tax tools such as Koinly include built-in cleaning features, but they apply their own criteria, which may not match the AEAT requirements exactly. We recommend using them as a complement for cross-checking.

Documenting the cleaning process

Every decision taken during the cleaning process must be documented. This documentation serves to reproduce the process if you need to recalculate it, provides evidence for potential AEAT inquiries, and facilitates review by third parties.

Contents of the methodology document

  • Full inventory of sources with export dates
  • Normalization criteria applied to each type of field
  • Mapping table of transaction types
  • Pricing sources used for EUR conversion
  • Record of duplicates removed with justification
  • Anomalies detected and how they were handled
  • Results of integrity checks

Also keep all intermediate files from the process, not just the originals and the final result. If you later detect an error, the intermediate files allow you to identify in which phase it was introduced.

In our portfolio monitoring services, we include full documentation of the analysis process delivered together with the tax report, ensuring complete traceability.

Critical errors you must avoid

There are errors in the cleaning process that have especially serious consequences:

1. Deleting legitimate transactions as "duplicates"

In active trading it is common to execute multiple transactions in the same pair within short time intervals. Check that the transactions are truly identical before deleting them.

2. Incorrect currency conversions

Dividing when you should multiply, or using the inverse exchange rate, generates absurd values that may go unnoticed without balance checks.

3. Ignoring fees

Fees are deductible expenses that reduce your capital gain. A dataset where fees have been lost produces a return that pays more tax than is due.

4. Classifying internal movements as sales

Movements between your own exchanges and wallets are not taxable. If they are classified as sales, they generate fictitious capital gains that you should not be paying.

If you are not confident about any part of the process, our tax advisory team specialized in cryptocurrencies can review your work and detect issues before they impact your tax return.

Conclusion: clean data, correct tax return

Cleaning cryptocurrency datasets is not an optional step or a cosmetic improvement. It is a fundamental requirement for obtaining an accurate tax return that complies with AEAT requirements and that you can defend in the event of an inquiry. A dirty dataset will produce incorrect results regardless of the quality of the software or calculation methodology you use afterwards.

The entire process requires time, attention to detail, and knowledge of both cryptocurrency trading and Spanish tax requirements. For users with simple trading and few transactions, following this guide should be enough. For complex activity with multiple exchanges, significant DeFi operations, or thousands of transactions, professional assistance can save time and prevent costly errors.

At Cleriontax we combine experience in cryptocurrency taxation with rigorous data engineering methodologies to ensure that every report we prepare is based on verified and traceable data. If you prefer to delegate this technical process, our specialized services are designed exactly for that.

Your next step: If you have already exported your data and need help with cleaning, consolidation, or tax calculations, contact our team for a no-obligation initial assessment. If you are still in the export phase, visit our blog where you will find specific guides for each exchange and wallet.

Disclaimer: This article is for informational and educational purposes only. It does not constitute personalized tax advice. Tax regulations are subject to change and each personal situation is unique. Always consult a professional tax advisor before making tax decisions.

Last updated: December 2025

Published by: Cleriontax Team – Experts in Crypto Taxation and Data Analysis

Did you find this article helpful?

Share it with other investors who might need it

Related articles

Continue learning about cryptocurrency taxation

Need help with your cryptocurrency tax return?

Our team of experts can analyze your case and prepare your complete tax return

Request free analysis