Amazon Price History from Keepa with Python

Andrei Kushniarou
9 min readMay 13, 2024

--

Let’s consider an important task of obtaining the price history of products from Amazon.com. Monitoring prices is crucial for pricing strategies or tracking competitors. Very often, data on products and their prices are the core for building effective analytics. Additional data can be “attached” to the ASIN and date to create an efficient dashboard.

Unfortunately, the Amazon products price history cannot be retrieved from the official Amazon SP-API (which only provides current prices). However, there are several services that continuously scan Amazon and monitor price changes and other parameters, such as the Buy Box. One popular service is keepa.com.

Table of Contents

  • Simple Query
  • Time Conversion
  • Resampling Price Values
  • Handling NaN Values
  • Final Script
  • Simple visualization
  • Saving additional data from Keepa

Simple Query

To retrieve the price history from Keepa.com, we will use Python and the Keepa Python library [1].

First, install libraries:

pip install keepa pandas matplotlib

Next, define the Amazon ASINs that we need to download. I have chosen three ASINs from Amazon Basics products just for example: B071JM699P, B074DZ6NJB, B073Q9PSWD.

Lets make a simple query and receive all data about products from Keepa.

#import necessary libraries
from datetime import datetime, timedelta
import keepa, json

#We need API Key from Keepa
KEEPA_API_KEY = "[INSERT YOUR KEY HERE]"

#Define api object
api = keepa.Keepa(KEEPA_API_KEY)

#Get how many tokens we have
print('Tokens Left:', api.tokens_left)
# If you'll receive an answer like this
# Tokens Left: 300
# ... key is correct and API is working

#Define our asins list
asins_list = ['B071JM699P','B074DZ6NJB','B073Q9PSWD']

#Call product info with prices from Keepa
products = api.query(
asins_list,
progress_bar = False
)

#Iterate over all pairs from prices_data
for product in products:

#Just show info about product
print(f"{product['asin']} | {product['title']}")

#Get prices info
csv = product['csv']
prices_data = csv[0]
print(prices_data)

We’ll have a result like this:

B074DZ6NJB | Amazon Basics Rubber Encased Exercise & Fitness Hex Dumbbell, Hand Weight for Strength Training, 45 lb, Black & Silver
[3658724, 4299, 3666592, 4749, 3674690, 4468, 3675832, ...

Time Conversion

At first glance, this data might seem strange, but it is indeed the correct result because the data is packed. The first number represents the date, and the second number represents the price in cents. To convert a date from Keepa format into a date that we can work with (Unix time), we need to add 21 564 000 and multiply by 60 000. [2]

Lets create a function for time conversion.

#Convert keepa time to unix time
def keepaTimeMinutesToUnixTime(keepaMinutes):
return (21564000 + keepaMinutes) * 60000;

To convert the data, we need to divide all numbers into pairs and process the first number as the date and the second as the price.

...

#Iterate through all products in the result
for product in products:

#Just show info about product
print(f"{product['asin']} | {product['title']}")

#Get prices info
csv = product['csv']
prices_data = csv[0]

# Transform the prices data into a list
transformed_prices = []

#Iterate over all pairs from prices_data
for keepaMinutes, val in zip(prices_data[::2], prices_data[1::2]):

#Convert the date from Keepa minutes to a readable datetime object
date = datetime.utcfromtimestamp(keepaTimeMinutesToUnixTime(keepaMinutes) / 1000)

#Convert the price; if price is not -1, convert cents to dollars
val = val/100 if val>0 else val

#Append the date and price to the list
transformed_prices.append((date,val))

#Display the date and price from the list
for date,val in transformed_prices:
print(date,val)

If we ran the script we’ll have a list with date and price. And this is very beautiful because we got the price history from the very beginning of the product launch!

...
B074DZ6NJB | Amazon Basics Rubber Encased Exercise & Fitness Hex Dumbbell, Hand Weight for Strength Training, 45 lb, Black & Silver
2017-12-15 18:44:00 42.99
2017-12-21 05:52:00 47.49
2017-12-26 20:50:00 44.68
...
2024-04-21 04:22:00 55.27
2024-04-28 23:04:00 54.92
2024-05-06 23:44:00 55.03

A Litte Extra! With Python’s comprehensions we can compact multiple lines to one. Power of Python!

...
#Transform the prices data into a list
transformed_prices = [(datetime.utcfromtimestamp(keepaTimeMinutesToUnixTime(keepaMinutes) / 1000), val/100 if val>100 else val) for
keepaMinutes, val in zip(prices_data[::2], prices_data[1::2])]
...

Resampling Price Values

Important note about data: Keepa only shows the price if there is a change. For instance, if there is a record of the price being $55.27 on April 21 and then $54.92 on April 28, what was the price on April 22, 23, 24, etc.? The answer is $55.27. Therefore, we need to convert the data into a different format to fill in all the gaps.

We need to get the average price if there are multiple records for one day and interpolate missing days. We also need to remember that if there is no data, it is marked as -1; we must not confuse -1 with the real price.

It is convenient to match data from different sources by aggregating them into the same time interval. This could be hourly, daily, weekly, etc. Let’s aggregate the data by days. In fact, data aggregated daily are easy to work with. They don’t take up much space and can be further aggregated by weeks or months, making them easy to compare.

I’ll use Pandas library to process all the data. If Python is a Swiss Army knife, then Pandas is the combine harvester that will go through your field of data and neatly collect it into the “haystacks” of data you need :)

We’ll create a DataFrame with two columns (date and price) from our transformed_prices list and resample it to days.

#Iterate through all products in the result
for product in products:

# Just show info about product
print(f"{product['asin']} | {product['title']}")

# Get prices info
csv = product['csv']
prices_data = csv[0]

#Transform the prices data into a list
transformed_prices = [(datetime.utcfromtimestamp(keepaTimeMinutesToUnixTime(keepaMinutes) / 1000), val/100 if val>100 else val) for
keepaMinutes, val in zip(prices_data[::2], prices_data[1::2])]

#Create dataframe with prices from transformed_prices
df_prices = pd.DataFrame(transformed_prices, columns=['date', 'price'])
df_prices.set_index('date', inplace=True)

#Resample dataframe by days
df_prices = df_prices.resample('D').mean()

#Show dataframe
print(df_prices)

The result:

date             
2017-12-15 42.99
2017-12-16 NaN
2017-12-17 NaN
2017-12-18 NaN
2017-12-19 NaN
... ...
2024-05-02 NaN
2024-05-03 NaN
2024-05-04 NaN
2024-05-05 NaN
2024-05-06 55.03

Handling NaN Values

So, we have NaN values for all days without data. Let’s use Pandas’ ffill method [3] to replace NaN values by propagating the last valid observation forward to the next valid one. Additionally, we'll add a piece of code to extend the last observed price to the current day.

...
# Create dataframe with prices from transformed_prices
df_prices = pd.DataFrame(transformed_prices, columns=['date', 'price'])
df_prices.set_index('date', inplace=True)

# Resample dataframe by days and forward fill to handle NaN values
df_prices = df_prices.resample('D').mean().ffill()

# Extend the DataFrame to include all days up to the current day
if not df_prices.empty:
last_date = df_prices.last_valid_index()
current_date = pd.to_datetime("today").normalize() # Normalize the time to 00:00:00
new_index = pd.date_range(start=df_prices.index.min(), end=current_date, freq='D')
df_prices = df_prices.reindex(new_index, method='ffill') # Reindex and fill all missing days

# Show dataframe (last records)
print(df_prices.tail(20))
2024-04-24  55.27
...
2024-04-27 55.27
2024-04-28 54.92
...
2024-05-05 54.92
2024-05-06 55.03
...
2024-05-13 55.03

Note: When aggregating data, there is a scenario where we might have a -1 (which Keepa uses to indicate no data) and a positive price for the same day. For example, the data:

2023-05-15 00:15:00 51.00
2023-05-15 01:00:00 -1

will result in an average price of $25.

This isn’t quite correct, so when aggregating data, we need to filter out these values for the mean calculation. Instead of: df_prices.resample('D').mean() we will use df_prices.resample('D').apply(lambda x:x[x>-1].mean())

Final Script

Now let’s finish our script. For each processed product, we will save the ASIN and market (default is US) and store it in a CSV file. There is our final script:

#import necessary libraries
from datetime import datetime, timedelta
import keepa, json

#We need API Key from Keepa
KEEPA_API_KEY = "[INSERT YOUR KEY HERE]"

#Define api object
api = keepa.Keepa(KEEPA_API_KEY)

#Get how many tokens we have
print('Tokens Left:', api.tokens_left)
# If you'll receive an answer like this
# Tokens Left: 300
# ... key is correct and API is working

#Define our asins list
asins_list = ['B071JM699P','B074DZ6NJB','B073Q9PSWD']

#Call product info with prices from Keepa
products = api.query(
asins_list,
progress_bar = False
)


#Define result frame with all data
df_result = pd.DataFrame()

#Iterate through all products in the result
for product in products:

# Just show info about product
print(f"{product['asin']} | {product['title']}")

# Get prices info
csv = product['csv']
prices_data = csv[0]

#Transform the prices data into a list
transformed_prices = [(datetime.utcfromtimestamp(keepaTimeMinutesToUnixTime(keepaMinutes) / 1000), val/100 if val>100 else val) for
keepaMinutes, val in zip(prices_data[::2], prices_data[1::2])]

# Create dataframe with prices from transformed_prices
df_prices = pd.DataFrame(transformed_prices, columns=['date', 'price'])
df_prices.set_index('date', inplace=True)

# Resample dataframe by days and forward fill to handle NaN values
df_prices = df_prices.resample('D').apply(lambda x: x[x > -1].mean()).ffill()

# Extend the DataFrame to include all days up to the current day
if not df_prices.empty:
last_date = df_prices.last_valid_index()
current_date = pd.to_datetime("today").normalize() # Normalize the time to 00:00:00
new_index = pd.date_range(start=df_prices.index.min(), end=current_date, freq='D')
df_prices = df_prices.reindex(new_index, method='ffill') # Reindex and fill all missing days

#Add columns about asin and market
df_prices['asin'] = product['asin']
df_prices['market'] = 'us' #need, if we want to have multiple market products in one dataframe
df_prices = df_prices.rename_axis('date').reset_index() #we need to move date from index

#Concat df_prices to one frame
df_result = pd.concat([df_result, df_prices], ignore_index=True)

#Show result frame
print(df_result)

#Save result frame to csv
file_path = 'amazon_asins.csv'
df_result.to_csv(file_path)

And there we have a nice-looking result.

B071JM699P | Amazon Basics Woodcased #2 Pencils, Pre-sharpened, HB Lead Bulk Box, 150 Count, Yellow
B073Q9PSWD | Amazon Basics Enameled Cast Iron Covered Round Dutch Oven, 7.3-Quart, Red
B074DZ6NJB | Amazon Basics Rubber Encased Exercise & Fitness Hex Dumbbell, Hand Weight for Strength Training, 45 lb, Black & Silver

date price asin market
0 2020-01-16 12.49 B071JM699P us
1 2020-01-17 12.49 B071JM699P us
2 2020-01-18 12.49 B071JM699P us
3 2020-01-19 12.49 B071JM699P us
4 2020-01-20 12.49 B071JM699P us
... ... ... ... ...
6151 2024-05-09 55.03 B074DZ6NJB us
6152 2024-05-10 55.03 B074DZ6NJB us
6153 2024-05-11 55.03 B074DZ6NJB us
6154 2024-05-12 55.03 B074DZ6NJB us
6155 2024-05-13 55.03 B074DZ6NJB us

[6156 rows x 4 columns]

Simple visualization

Let’s create a simple visualization of the resulting DataFrame. We will display the price changes of all the obtained products on a single graph for the last year.


#Import library
import matplotlib.pyplot as plt

#Get only last year data
last_year_data = df_result[df_result['date'] >= (df_result['date'].max() - pd.DateOffset(years=1))]

#Define plot
plt.figure(figsize=(12, 6))

#Define plot for every asin
for asin in last_year_data['asin'].unique():
asin_data = last_year_data[last_year_data['asin'] == asin]
plt.plot(asin_data['date'], asin_data['price'], label=asin)

#Set plot label/titles
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Price over the Last Year by ASIN')
plt.legend()
plt.grid(True)

#Save plot to file
plt.savefig('price_over_last_year_by_asin.png')

#or Show the plot

plt.show()

The our result graph:

Price over the Last Year by ASIN from Result Dataframe

Saving additional data from Keepa

You might ask, can we save additional data, not just prices? The answer is yes. For every product, Keepa returns a two-dimensional array with lots of useful parameters [4].

0 — AMAZON: Amazon price history (we use it inprice_data=csv[0])
1 — NEW: Marketplace New price history
2 — USED: Marketplace Used price history
3 — SALES: Sales Rank history (not every product has a Sales Rank; variation items usually don’t have individual sales ranks)

16 — RATING: The product’s rating history (an integer from 0 to 50, e.g., 45 = 4.5 stars)
17 — COUNT_REVIEWS: The product’s rating count history
….
So, we can save not only Amazon prices but also sales rank (csv[3]) or product rating history (csv[16]) and join the data in one DataFrame. In fact, using this data, we can monitor not only the price but also the quality of the product (via rating and reviews) and the Buy Box price to build analytics to understand how to sell better.

About Author

The author of this article is the Founder and CTO of SellerFlux. We are Empowering E-commerce with Data-Driven Solutions and specialize in providing software solutions specifically designed for e-commerce businesses. If you need expert help in building, optimizing, or automating your e-commerce analytics and data processes, don’t hesitate to Сontact us.

--

--

Andrei Kushniarou
Andrei Kushniarou

Written by Andrei Kushniarou

Full-stack developer, BI/data engineer, and entrepreneur with 20+ years of experience. Founder at SellerFlux, empowering e-commerce with data-driven solutions.

No responses yet