Skip to content


ROI aligning for Mask R-CNN.

This layer performs ROI aligning operation for spatial data. The algorithm is according to an ONNX's special spec, different from one in the original paper. It takes three inputs, input, ROIs and batch_indices.
The shape of input is (b, h, w, c).
The shape of ROIs is (num_roi, 4).
The shape of batch_indices is (num_roi).

The shape of output is (num_roi, output_h, output_w, c).


  • output_h : The height of output output
  • output_w : The width of output output
  • pooling_mode : The mode of pooling method. One of "average", "max", "onnx_max".
  • sampling_ratio : Specifies number of sampling points to obtain a output pixel given (if >0): sampling_ratio x sampling_ratio adaptive (if =0, default): ceil(roi_w / output_w) x ceil(roi_h / output_h)
  • spatial_scale : Multiplicative spatial scale factor to translate ROI coords from their input scale to the scale used when pooling. Default is 1.0.