Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add segsort from moderngpu #181

Merged
merged 13 commits into from
Sep 26, 2020
Merged

Conversation

csukuangfj
Copy link
Collaborator

Closes #176


namespace {

class ModernGpuContext : public mgpu::standard_context_t {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if this should be called ModernGpuAllocator to clarify that it's not an instance of k2::Context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Collaborator

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was fast! Thanks!

@csukuangfj
Copy link
Collaborator Author

There exists an issue for the current commit:

k2/k2/csrc/ragged_inl.h

Lines 114 to 129 in 5a9fc1c

mgpu::segmented_sort_indices(src->values.Data(), // keys
order->Data(), // indices
src->values.Dim(), // count
segment.Data() + 1, // segments
segment.Dim() - 1, // num_segments
Op(), // cmp
*context); // context
auto err = cudaGetLastError();
(void)err;
// TODO(fangjun): err is not cudaSuccess, but why was the data sorted
// correctly?
//
// Check failed: err == cudaSuccess (9 vs. 0) Error: invalid configuration
// argument.
//
// K2_DCHECK_CUDA_ERROR(err);

The cuda error code for mgpu::segmented_sort_indices is not cudaSuccess, but it sorts the data correctly.

@danpovey
Copy link
Collaborator

Thanks!!
At some point it should be possible to implement ArcSort (see fsa_algo.h).

@danpovey danpovey merged commit ee562fc into k2-fsa:master Sep 26, 2020
std::unique_ptr<mgpu::context_t> context =
GetModernGpuAllocator(src->Context()->GetDeviceId());

Array1<int32_t> &segment = src->shape.RowSplits(src->NumAxes() - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to add K2_CHECK_EQ(src->shape.NumAxes() == 2 at entry as we just sort values in src

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It supports ragged arrays with more than two axes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at least check >=2 here otherwise there will be an error when calling shape.RowSplits(NumAxes() - 1), note that you should add such requirements in the documentation

k2/k2/csrc/ragged.h

Lines 577 to 584 in a178524

Sort a ragged array in-place.
@param [inout] The input array to be sorted.
CAUTION: it is sorted in-place.
@param [out] The indexes mapping from the sorted
array to the input array. The caller
has to pre-allocate memory for it
on the same device as `src`.

BTW, in other APIs, such as MaxPerSublist, we require NumAxes() ==2 now. @danpovey I think we should make those api consistent, should we require exactly NumAxes() == 2 or NumAxes() >=2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'd like to make the APIs as general as they can be made without requiring extra work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has no requirement about the number of axes a ragged array has. It always sorts the last axis.
This function works as long as the ragged array is not empty.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm.. definitely it requires NumAxes >=2...It's meaningless to call RowSplits(1) on a shape with NumAxes < 2, actually it will crash..

Then I suggest we add documentation and requirements for those APIs with axes >= 2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give an example to show what a ragged array with NumAxes < 2 looks like?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm, the point here is, now we allow user define a empty RaggedShape technically,

RaggedShape() = default;
, so it's better if we can check this and report error as early as possible (i.e. at entry of the function).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a K2_DCHECK in the next commit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to add K2_CHECK_EQ(src->shape.NumAxes() == 2 at entry as we just sort values in src

@qindazhu
fixed in #218

K2_DCHECK_GE(src->NumAxes(), 2);

mgpu::segmented_sort_indices(src->values.Data(), // keys
order->Data(), // indices
src->values.Dim(), // count
segment.Data() + 1, // segments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why +1 here? the api of mgpu is strange,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried segment.Data() and got the same result.

According to https://github.com/moderngpu/moderngpu/blob/2b3985541c8e88a133769598c406c33ddde9d0a5/tests/test_segsort.cu#L9-L15

 std::vector<int> cpu_segsort(const std::vector<int>& data,
  const std::vector<int>& segments) {

  std::vector<int> copy = data;
  int cur = 0;
  for(int seg = 0; seg < segments.size(); ++seg) {
    int next = segments[seg];
    std::sort(copy.data() + cur, copy.data() + next);
    cur = next;
  }
  std::sort(copy.data() + cur, copy.data() + data.size());
  return copy;
}

The start index of the first segment is 0. That's why I use segment.Data() + 1.

@danpovey
Copy link
Collaborator

danpovey commented Sep 27, 2020 via email

@csukuangfj csukuangfj deleted the fangjun-moderngpu branch September 29, 2020 13:52
@csukuangfj
Copy link
Collaborator Author

At some point it should be possible to implement ArcSort (see fsa_algo.h).

@danpovey

Are arcs sorted by ilabel or by olabel?

@danpovey
Copy link
Collaborator

danpovey commented Oct 4, 2020 via email

@danpovey
Copy link
Collaborator

danpovey commented Oct 4, 2020 via email

@danpovey
Copy link
Collaborator

danpovey commented Oct 4, 2020 via email

@danpovey
Copy link
Collaborator

danpovey commented Oct 4, 2020 via email

@csukuangfj
Copy link
Collaborator Author

Guys, IMO it would have been more convenient to allow SortSublists to
accept arrays with any num-axes but operate on the last one.

This is what the current implementation does, except that it requires a NON-empty input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort with operator
3 participants