Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer entity paths from LeRobot dataset feature metadata #8981

Merged
merged 1 commit into from
Feb 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions crates/store/re_data_loader/src/lerobot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,7 @@ impl LeRobotDatasetInfo {
pub struct Feature {
pub dtype: DType,
pub shape: Vec<usize>,
pub names: Option<Names>,
}

/// Data types supported for features in a `LeRobot` dataset.
Expand All @@ -366,6 +367,28 @@ pub enum DType {
Int64,
}

/// Name metadata for a feature in the `LeRobot` dataset.
///
/// The name metadata can consist of
/// - A flat list of names for each dimension of a feature (e.g., `["height", "width", "channel"]`).
/// - A list specific to motors (e.g., `{ "motors": ["motor_0", "motor_1", ...] }`).
#[derive(Debug, Serialize, Deserialize)]
#[serde(untagged)]
pub enum Names {
Motors { motors: Vec<String> },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just Motors(Vec<String>) to reflect what it looks like in the dataset? Otherwise this looks strangely inconsistent to the List(Vec<_>) below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I've seen these two variants:

"names": {
    "motors": [
        "motor_0",
        "motor_1",
        "motor_2",
        "motor_3",
        "motor_4",
        "motor_5",
        "motor_6",
        "motor_7"
    ]
}

or

"names": [
    "height",
    "width",
    "channel"
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like yes this enum is closely mirroring what shows up in the datasets :)

List(Vec<String>),
}

impl Names {
/// Retrieves the name corresponding to a specific index within the `names` field of a feature.
pub fn name_for_index(&self, index: usize) -> Option<&String> {
match self {
Self::Motors { motors } => motors.get(index),
Self::List(items) => items.get(index),
}
}
}

// TODO(gijsd): Do we want to stream in episodes or tasks?
#[cfg(not(target_arch = "wasm32"))]
fn load_jsonl_file<D>(filepath: impl AsRef<Path>) -> Result<Vec<D>, LeRobotError>
Expand Down
29 changes: 20 additions & 9 deletions crates/store/re_data_loader/src/loader_lerobot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ use re_types::archetypes::{AssetVideo, EncodedImage, VideoFrameReference};
use re_types::components::{Scalar, VideoTimestamp};
use re_types::{Archetype, Component, ComponentBatch};

use crate::lerobot::{is_le_robot_dataset, DType, EpisodeIndex, LeRobotDataset};
use crate::lerobot::{is_le_robot_dataset, DType, EpisodeIndex, Feature, LeRobotDataset};
use crate::{DataLoader, DataLoaderError, LoadedData};

/// Columns in the `LeRobot` dataset schema that we do not visualize in the viewer, and thus ignore.
Expand Down Expand Up @@ -112,7 +112,7 @@ impl DataLoader for LeRobotDatasetLoader {
);
}
DType::Float32 | DType::Float64 => {
chunks.extend(load_scalar(feature_key, &timelines, &data)?);
chunks.extend(load_scalar(feature_key, feature, &timelines, &data)?);
}
}
}
Expand Down Expand Up @@ -269,31 +269,35 @@ impl Iterator for ScalarChunkIterator {
impl ExactSizeIterator for ScalarChunkIterator {}

fn load_scalar(
feature: &str,
feature_key: &str,
feature: &Feature,
timelines: &IntMap<Timeline, TimeColumn>,
data: &RecordBatch,
) -> Result<ScalarChunkIterator, DataLoaderError> {
let field = data
.schema_ref()
.field_with_name(feature)
.with_context(|| format!("Failed to get field for feature {feature} from parquet file"))?;
.field_with_name(feature_key)
.with_context(|| {
format!("Failed to get field for feature {feature_key} from parquet file")
})?;

match field.data_type() {
DataType::FixedSizeList(_, _) => {
let fixed_size_array = data
.column_by_name(feature)
.column_by_name(feature_key)
.and_then(|col| col.downcast_array_ref::<FixedSizeListArray>())
.ok_or_else(|| {
DataLoaderError::Other(anyhow!(
"Failed to downcast feature to FixedSizeListArray"
))
})?;

let batch_chunks = make_scalar_batch_entity_chunks(field, timelines, fixed_size_array)?;
let batch_chunks =
make_scalar_batch_entity_chunks(field, feature, timelines, fixed_size_array)?;
Ok(ScalarChunkIterator::Batch(Box::new(batch_chunks)))
}
DataType::Float32 => {
let feature_data = data.column_by_name(feature).ok_or_else(|| {
let feature_data = data.column_by_name(feature_key).ok_or_else(|| {
DataLoaderError::Other(anyhow!(
"Failed to get LeRobot dataset column data for: {:?}",
field.name()
Expand Down Expand Up @@ -321,6 +325,7 @@ fn load_scalar(

fn make_scalar_batch_entity_chunks(
field: &Field,
feature: &Feature,
timelines: &IntMap<Timeline, TimeColumn>,
data: &FixedSizeListArray,
) -> Result<impl ExactSizeIterator<Item = Chunk>, DataLoaderError> {
Expand All @@ -330,7 +335,13 @@ fn make_scalar_batch_entity_chunks(
let mut chunks = Vec::with_capacity(num_elements);

for idx in 0..num_elements {
let entity_path = format!("{}/{idx}", field.name());
let name = feature
.names
.as_ref()
.and_then(|names| names.name_for_index(idx).cloned())
.unwrap_or(format!("{idx}"));

let entity_path = format!("{}/{name}", field.name());
chunks.push(make_scalar_entity_chunk(
entity_path.into(),
timelines,
Expand Down