Read Data

Use the Python client to read from a Synnax cluster.

The Python client supports several different ways of reading data from a cluster. We can read directly from a channel, fetch a range and access its data, or leverage server side iterators for processing large queries. If you’d like a conceptual overview of how to read data in Synnax, check out the reads page.

Reading from a Channel

The simplest way to read data from Synnax is to use the read method on the Channel class:

from datetime import datetime

channel = client.channels.retrieve("my_precise_tc")

time_format = "%Y-%m-%d %H:%M:%S"

start = datetime.strptime("2023-2-12 12:30:00", time_format)
end = datetime.strptime("2023-2-12 14:30:00", time_format)

data = channel.read(start, end)

The returned data is an instance of the Series class, but for all intents and purposes can be treated exactly like a numpy.ndarray. For example, we can perform vectorized operations on the data:

data = data - 273.15

The Series class does give us some additional functionality. Most notably, we can get the time range occupied by the data:

tr = data.time_range
print(tr)
# 2023-02-12 12:30:00 - 2023-02-12 14:30:00

This method is important, as it’s not always the case that data exists for the entire time range queried.

Reading from Multiple Channels

We can also read from multiple channels at once by calling the read method on the client. This method takes a list of channel names/keys and a time range:

frame = client.read(start, end, ["my_precise_tc", "time"])

The returned data is an instance of the Frame class. We can access Series on the class by using the [] operator:

data = frame["my_precise_tc"]

We can also convert the Frame to a pandas.DataFrame by calling the to_df method:

df = frame.to_df()

Reading Channel Data from a Range

While the above methods are useful for executing precise reads, they require us to know the exact range of time we’re interested in reading. Ranges are a useful way of categorizing important time ranges in a cluster’s data. We can read directly from these ranges.

We can access channels on a Range object and call read on them to access their data:

rng = client.ranges.retrieve("My Interesting Test")

# Read the data from the channel
data = rng.my_precise_tc.read()

data = data - 273.15

It turns out that we don’t even need to call the read method at all. We can just use the channel name directly to perform operations on the data:

data = rng.my_precise_tc - 273.15

We can also plot the data just as easily:

import matplotlib.pyplot as plt

# Plot time on the x-axis and temperature on the y-axis
plt.plot(rng.time, rng.my_precise_tc)

Reading with Iterators

Single, multi, and named reads will cover most use cases, but there are situations where it’s necessary to process large volumes of data. Sometimes these reads may be too large to fit in memory.

Synnax supports server side iterators that allow us to process large queries in consistently sized chunks. By default, Synnax uses a chunk size of 100,000. To configure a custom chunk size, pass in the chunk_size argument to the open_iterator method with the desired number of samples per iteration.

with client.open_iterator(start, end, "my_precise_tc", chunk_size=100) as it:
    for frame in it:
        # Do something with the frame