Skip to content

Run and Profiling Commands


Runs inference using the DNN file model. It's possible to get inference results and execution times.


usage: softneuro run [-o ONPY]... [-p PASSWORD] [--recipe RECIPE][--batch BATCH]
                     [--ishape SHAPE] [--keep_img_ar PADDINGCOLOR]
                     [--img_resize_mode RESIZEMODE] [--thread NTHREADS] [--noboost]
                     [--affinity MASK[@THREAD_INDICES]]
                     [-r ROUTINE[@LAYER_INDICES]] [-R RPARAMS[@LAYER_INDICES]] [-nobufopt DEVICE]
                     [--lib LIB] [-l LNUM] [--detail] [--detail2] [--bylayer]
                     [-dump DUMPDIR] [-dump2 DUMPRID] [-t NTOPS] [-h]
                     DNN [INPUT [INPUT ...]]


Argument Description
DNN DNN file for inference execution.
INPUT Input for inference execution. Can be a numpy file or an image file. If not provided, input will be uniform random numbers from [-1, 1].


Flag Description
-p PASSWORD --pass PASSWORD Password to run an encrypted DNN file.
-o ONPY File name to output the inference results as a numpy file.
--recipe RECIPE Set dnn recipe file. 
--batch BATCH Input batch size.
--ishape SHAPE Input shape. (Example: 1x224x224x3).
--keep_img_ar PADDINGCOLOR Keeps aspect ratio when resizing input image. The aspect ratio is not kept by default. Margin space is filled with the color specified by PADDINGCOLOR. PADDINGCOLOR can be specified by RGB value, for example, '0, 0, 0'.
--img_resize_mode RESIZEMODE Specifies the resizing mode. 'bilinear' or 'nearest' can be specified. Default is 'bilinear'.
--thread NTHREADS How many threads should be used for execution. Defaults to the number of CPU cores.
--noboost Set threads as cond wait when they're waiting for a task. If this isn't set, threads will be set to busy wait.
--affinity MASK[@THREAD_INDICES] Use the affinity mask given by MASK on the threads given byTHREAD_INDICES.
MASK should be a little endian hexadecimal (0x..), binary (0b..), or decimal number.
If THREAD_INDICES isn't set all threads will use the given mask.
For more information on THREAD_INDICES use the softneuro help thread_indices command.
-r, --routine ROUTINE[@LAYER_INDICES] Set routines to be used. If not set, the usually best available routines will be chosen (e.g. CUDA if there's CUDA support). The default is cpu.
If the model is tuned this setting is ignored.
If LAYER_INDICES isn't set all layers in main net will be applied.
For more information on LAYER_INDICES use the softneuro help layer_indices command.
-R RPARAMS[@LAYER_INDICES], --rparams RPARAMS[@LAYER_INDICES] Set routine parameters to be used.
If the model is tuned this setting is ignored.
If LAYER_INDICES isn't set all layers in main net will be applied.
For more information on LAYER_INDICES use the softneuro help layer_indices command.
--nobufopt DEVICE Disable the buffer optimizer for routines that run on the given device.
--lib LIB Set an OpenCL binary file when using online compilation.
-l, --loop LNUM Run inference LNUM times for benchmarking.
--detail Show detailed inference statistics.
--detail2 Show even more detailed inference statistics.
If a layer is made up of other layers this shows the processing times of the internal layers as well.
--bylayer Show detailed inference statistics by layer.
--dump DUMPDIR Dump each layer output as a numpy file in the given folder.
--dump2 DUMPDIR Dump each layer output and internal layer outputs if they exist in the given folder.
-t, --top NTOPS Shows the top TOP scores and labels for image classification models.
-h, --help Shows the command help.


$ softneuro run densenet121.dnn --thread 8 --affinity 0xf0@0..3 --affinity 0x0f@4..7 --top 5 --loop 10 shovel.jpg

Top 5 Labels
1  0.9999  shovel
2  0.0001  hatchet
3  0.0000  broom
4  0.0000  swab
5  0.0000  spatula

FUNCTION       AVE(us)  MIN(us)  MAX(us)  #RUN
Dnn_load()      43,070   43,070   43,070     1
Dnn_compile()   28,567   28,567   28,567     1
Dnn_forward()   39,877   39,751   39,983    10

Used memory: 88,403,968 Bytes

preprocess: 81 68 68 69 70 64 71 69 70 68
main: 39872 39698 39710 39679 39896 39884 39770 39801 39908 39824
TOTAL: 39955 39767 39778 39749 39968 39949 39842 39873 39981 39894

The inference time is given by Dnn_forward. AVE, MIN and MAX are, respectively, the average, minimum and maximum execution times for the number of runs shown under #RUN.


Initializes profiling data.


usage: softneuro init [--thread NTHREADS] [--affinity MASK[@THREAD_INDICES]]
                      [--pass PASSWORD] [--help]
                      PROF DNN


Argument Description
PROF Directory where the profiling data will be initialized.
DNN DNN file to be profiled.


Flag Description
--thread TNUM How many threads should be used for execution. Defaults to the number of CPU cores.
--affinity MASK[@THREAD_INDICES] Use the affinity mask given by MASK on the threads given byTHREAD_INDICES.
MASK should be a little endian hexadecimal (0x..), binary (0b..), or decimal number.
If THREAD_INDICES isn't set all threads will use the given mask.
For more information on THREAD_INDICES use the softneuro help thread_indices command.
--pass PASSWORD Password to profile an encrypted DNN file.
-h, --help Shows the command help.

The command creates the mobilnet_prof directory with profiling data.
※There's no terminal output

$ softneuro init mobilenet_prof mobilenet.dnn


Configure routines to layers present in the profiling data.


usage: softneuro add [--dnn DNN] [--pass PASSWORD] [--ref REF] [--ref-pass REF_PASSWORD] [--help]
                     PROF [ROUTINE[@LAYER_INDICES]]...


Argument Description
PROF Directory containing profiling data.
ROUTINE[@LAYER_INDICES] Set the routine given by ROUTINE to the layers given by LAYER_INDICES.
If LAYER_INDICES isn't set, the routine will be set to all layers in the main network. The ROUTINE format can be checked with the softneuro help routine_desc command, and the LAYER_INDICES format can be checked with the softneuro help layer_indices command.


Flag Description
--dnn DNN A DNN file used to create profiling data.
-p, --pass PASSWORD The password required to use the prof file.
--ref REF the reference dnn file when profiling a secret dnn.
--ref-pass REF_PASSWORD the password for REF.
-h, --help Shows the command help.

Set all main network layers to use the cpu:qint8 routine, if supported.

$ softneuro add mobilenet_prof cpu:qint8
adding routines...done.

$ softneuro status mobilenet_prof
0  ? (source)
1  ? (madd)
2  ? (sink)

 #  NAME                                              ROUTINE        TIME  DESC  PARAMS
 0  input_1 (source)
 1  conv1 (conv2)                                     cpu:qint8 (9)
 2  conv_dw_1 (depthwise_conv2)                       cpu:qint8 (3)
 3  conv_pw_1 (conv2)                                 cpu:qint8 (9)
 4  conv_dw_2 (depthwise_conv2)                       cpu:qint8 (3)
 5  conv_pw_2 (conv2)                                 cpu:qint8 (9)
 6  conv_dw_3 (depthwise_conv2)                       cpu:qint8 (3)
 7  conv_pw_3 (conv2)                                 cpu:qint8 (9)
 8  conv_dw_4 (depthwise_conv2)                       cpu:qint8 (3)
 9  conv_pw_4 (conv2)                                 cpu:qint8 (9)
10  conv_dw_5 (depthwise_conv2)                       cpu:qint8 (3)
11  conv_pw_5 (conv2)                                 cpu:qint8 (9)
12  conv_dw_6 (depthwise_conv2)                       cpu:qint8 (3)
13  conv_pw_6 (conv2)                                 cpu:qint8 (9)
14  conv_dw_7 (depthwise_conv2)                       cpu:qint8 (3)
15  conv_pw_7 (conv2)                                 cpu:qint8 (9)
16  conv_dw_8 (depthwise_conv2)                       cpu:qint8 (3)
17  conv_pw_8 (conv2)                                 cpu:qint8 (9)
18  conv_dw_9 (depthwise_conv2)                       cpu:qint8 (3)
19  conv_pw_9 (conv2)                                 cpu:qint8 (9)
20  conv_dw_10 (depthwise_conv2)                      cpu:qint8 (3)
21  conv_pw_10 (conv2)                                cpu:qint8 (9)
22  conv_dw_11 (depthwise_conv2)                      cpu:qint8 (3)
23  conv_pw_11 (conv2)                                cpu:qint8 (9)
24  conv_dw_12 (depthwise_conv2)                      cpu:qint8 (3)
25  conv_pw_12 (conv2)                                cpu:qint8 (9)
26  conv_dw_13 (depthwise_conv2)                      cpu:qint8 (3)
27  conv_pw_13 (conv2)                                cpu:qint8 (9)
28  global_average_pooling2d_1 (global_average_pool)
29  reshape_1 (reshape)                               cpu:qint8 (1)
30  conv_preds (conv2)                                cpu:qint8 (9)
31  act_softmax (softmax)
32  reshape_2 (reshape)                               cpu:qint8 (1)
33  sink_0 (sink)

ROUTINES  cpu:qint8
TOTAL     ?


Remove profiling information for the given routine.


usage: softneuro rm [--dnn DNN] [--pass PASSWORD] [--ref REF] [--ref-pass REF_PASSWORD] [--help] PROF [ROUTINE@IDS]...


Argument Description
PROF Directory containing profiling data.
ROUTINE[@LAYER_INDICES] Set the routine given by ROUTINE to the layers given by LAYER_INDICES.
If LAYER_INDICES isn't set, the routine will be set to all layers in the main network. The ROUTINE format can be checked with the softneuro help routine_desc command, and the LAYER_INDICES format can be checked with the softneuro help layer_indices command.


Flag Description
--dnn DNN A DNN file used to create profiling data.
-p PASSWORD, --pass PASSWORD The password required to use the prof file.
--ref REF the reference dnn file when profiling a secret dnn.
--ref-pass REF_PASSWORD the password for REF.
-h, --help Shows the command help.

Remove all routine settings from mobilenet_prof.

$ softneuro rm mobilenet_prof
removing routines...done.

$ softneuro status mobilenet_prof
0  ? (source)
1  ? (madd)
2  ? (sink)

 #  NAME                                              ROUTINE  TIME  DESC  PARAMS
 0  input_1 (source)
 1  conv1 (conv2)
 2  conv_dw_1 (depthwise_conv2)
 3  conv_pw_1 (conv2)
 4  conv_dw_2 (depthwise_conv2)
 5  conv_pw_2 (conv2)
 6  conv_dw_3 (depthwise_conv2)
 7  conv_pw_3 (conv2)
 8  conv_dw_4 (depthwise_conv2)
 9  conv_pw_4 (conv2)
10  conv_dw_5 (depthwise_conv2)
11  conv_pw_5 (conv2)
12  conv_dw_6 (depthwise_conv2)
13  conv_pw_6 (conv2)
14  conv_dw_7 (depthwise_conv2)
15  conv_pw_7 (conv2)
16  conv_dw_8 (depthwise_conv2)
17  conv_pw_8 (conv2)
18  conv_dw_9 (depthwise_conv2)
19  conv_pw_9 (conv2)
20  conv_dw_10 (depthwise_conv2)
21  conv_pw_10 (conv2)
22  conv_dw_11 (depthwise_conv2)
23  conv_pw_11 (conv2)
24  conv_dw_12 (depthwise_conv2)
25  conv_pw_12 (conv2)
26  conv_dw_13 (depthwise_conv2)
27  conv_pw_13 (conv2)
28  global_average_pooling2d_1 (global_average_pool)
29  reshape_1 (reshape)
30  conv_preds (conv2)
31  act_softmax (softmax)
32  reshape_2 (reshape)
33  sink_0 (sink)



Resets profiling data to defaults.


usage: softneuro reset [--dnn DNN] [--pass PASSWORD]  [--ref REF] [--ref-pass REF_PASSWORD] [--help]
                       PROF [ROUTINE@IDS]...


Argument Description
PROF Directory containing profiling data.
ROUTINE[@LAYER_INDICES] Set the routine given by ROUTINE to the layers given by LAYER_INDICES.
If LAYER_INDICES isn't set, the routine will be set to all layers in the main network. The ROUTINE format can be checked with the softneuro help routine_desc command, and the LAYER_INDICES format can be checked with the softneuro help layer_indices command.


Flag Description
--dnn DNN A DNN file used to create profiling data.
-p PASSWORD, --pass PASSWORD The password required to use the prof file.
--ref REF the reference dnn file when profiling a secret dnn.
--ref-pass REF_PASSWORD the password for REF.
-h, --help Shows the command help.

Reset the mobilenet_prof profiling data.

$ softneuro reset mobilenet_prof
resetting routines...done.
$ softneuro status mobilenet_prof
0  ? (source)
1  ? (madd)      cpu (3)
2  ? (sink)

 #  NAME                                              ROUTINE   TIME  DESC  PARAMS
 0  input_1 (source)
 1  conv1 (conv2)                                     cpu (15)
 2  conv_dw_1 (depthwise_conv2)                       cpu (3)
 3  conv_pw_1 (conv2)                                 cpu (47)
 4  conv_dw_2 (depthwise_conv2)                       cpu (3)
 5  conv_pw_2 (conv2)                                 cpu (47)
 6  conv_dw_3 (depthwise_conv2)                       cpu (3)
 7  conv_pw_3 (conv2)                                 cpu (47)
 8  conv_dw_4 (depthwise_conv2)                       cpu (3)
 9  conv_pw_4 (conv2)                                 cpu (47)
10  conv_dw_5 (depthwise_conv2)                       cpu (3)
11  conv_pw_5 (conv2)                                 cpu (47)
12  conv_dw_6 (depthwise_conv2)                       cpu (3)
13  conv_pw_6 (conv2)                                 cpu (47)
14  conv_dw_7 (depthwise_conv2)                       cpu (3)
15  conv_pw_7 (conv2)                                 cpu (47)
16  conv_dw_8 (depthwise_conv2)                       cpu (3)
17  conv_pw_8 (conv2)                                 cpu (47)
18  conv_dw_9 (depthwise_conv2)                       cpu (3)
19  conv_pw_9 (conv2)                                 cpu (47)
20  conv_dw_10 (depthwise_conv2)                      cpu (3)
21  conv_pw_10 (conv2)                                cpu (47)
22  conv_dw_11 (depthwise_conv2)                      cpu (3)
23  conv_pw_11 (conv2)                                cpu (47)
24  conv_dw_12 (depthwise_conv2)                      cpu (3)
25  conv_pw_12 (conv2)                                cpu (47)
26  conv_dw_13 (depthwise_conv2)                      cpu (3)
27  conv_pw_13 (conv2)                                cpu (47)
28  global_average_pooling2d_1 (global_average_pool)  cpu (1)
29  reshape_1 (reshape)                               cpu (1)
30  conv_preds (conv2)                                cpu (47)
31  act_softmax (softmax)                             cpu (1)
32  reshape_2 (reshape)                               cpu (1)
33  sink_0 (sink)

TOTAL     ?


Show the routines, parameters and measured profiling times for each layer.


usage: softneuro status [--dnn DNN] [--pass PASSWORD] [--ref REF] [--ref-pass REF_PASSWORD] [--at INDEX]
                        [--estimate MODE] [--csv] [--help]


Argument Description
PROF Directory containing profiling data.


Flag Description
--dnn DNN A DNN file used to create profiling data.
-p, --pass PASSWORD The password required to use the encrypted prof file.
--ref REF the reference dnn file when profiling a secret dnn.
--ref-pass REF_PASSWORD the password for REF.
-@, --at INDEX Show only the information for the layer at the given index.
--estimate MODE Execution time estimation mode. Can be robust (default), min or ave.
--csv Output information in CSV format.
--help Show the command help.

The example information is for after running the profile command to measure execution times.

$ softneuro status mobilenet_prof
0  ? (source)
1  ? (madd)      cpu (3)    28  cpu/avx  {"ops_in_task":16384}
2  ? (sink)

 #  NAME                                              ROUTINE    TIME  DESC            PARAMS
 0  input_1 (source)
 1  conv1 (conv2)                                     cpu (15)    213  cpu/owc64_avx   {"cache":8192,"task_ops":131072}
 2  conv_dw_1 (depthwise_conv2)                       cpu (3)     110  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
 3  conv_pw_1 (conv2)                                 cpu (47)    195  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":144}
 4  conv_dw_2 (depthwise_conv2)                       cpu (3)      60  cpu/owc32_avx   {"cache":8192,"task_ops":32768}
 5  conv_pw_2 (conv2)                                 cpu (47)    177  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":72}
 6  conv_dw_3 (depthwise_conv2)                       cpu (3)     113  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
 7  conv_pw_3 (conv2)                                 cpu (47)    328  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":36}
 8  conv_dw_4 (depthwise_conv2)                       cpu (3)      40  cpu/owc32_avx   {"cache":8192,"task_ops":32768}
 9  conv_pw_4 (conv2)                                 cpu (47)    167  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":96}
10  conv_dw_5 (depthwise_conv2)                       cpu (3)      68  cpu/owc32_avx   {"cache":8192,"task_ops":131072}
11  conv_pw_5 (conv2)                                 cpu (47)    320  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":96}
12  conv_dw_6 (depthwise_conv2)                       cpu (3)      23  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
13  conv_pw_6 (conv2)                                 cpu (47)    164  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
14  conv_dw_7 (depthwise_conv2)                       cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
15  conv_pw_7 (conv2)                                 cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
16  conv_dw_8 (depthwise_conv2)                       cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
17  conv_pw_8 (conv2)                                 cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
18  conv_dw_9 (depthwise_conv2)                       cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
19  conv_pw_9 (conv2)                                 cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
20  conv_dw_10 (depthwise_conv2)                      cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
21  conv_pw_10 (conv2)                                cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
22  conv_dw_11 (depthwise_conv2)                      cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
23  conv_pw_11 (conv2)                                cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
24  conv_dw_12 (depthwise_conv2)                      cpu (3)      14  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
25  conv_pw_12 (conv2)                                cpu (47)    169  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
26  conv_dw_13 (depthwise_conv2)                      cpu (3)      19  cpu/owc32_avx   {"cache":8192,"task_ops":32768}
27  conv_pw_13 (conv2)                                cpu (47)    336  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":8}
28  global_average_pooling2d_1 (global_average_pool)  cpu (1)      13  cpu/naive       {}
29  reshape_1 (reshape)                               cpu (1)       0  cpu             {}
30  conv_preds (conv2)                                cpu (47)     23  cpu/owc64_avx   {"cache":8192,"task_ops":32768}
31  act_softmax (softmax)                             cpu (1)      21  cpu/naive       {}
32  reshape_2 (reshape)                               cpu (1)       0  cpu             {}
33  sink_0 (sink)

TOTAL     4,336


Run profiling based on profiling data.


usage: softneuro profile [--dnn DNN] [--pass PASSWORD] [--help] PROF


Argument Description
PROF Directory containing profiling data.


Flag Description
--dnn DNN A DNN file used to create profiling data.
-p, --pass PASSWORD The password required to use the encrypted prof file.
--help Shows the command help.

After using the init command to generate profiling data, the profile command measures execution times and saves the profiling information into the profiling data directory.

$ softneuro prof mobilenet_prof
profiling...100.0% [00:01]


Tune a DNN file for faster inference times. If profiling data isn't provided, the command automatically runs profiling.


usage: softneuro tune [--prof PROF] [--recipe RECIPE] [--thread NTHREADS]
                      [--affinity MASK[@THREAD_INDICES]] [--pass PASSWORD]
                      [--routine ROUTINE[@IDS]]... [--estimate MODE] [--help]
                      INPUT OUTPUT


Argument Description
INPUT DNN file to be tuned.
OUTPUT Output tuned DNN file.


Flag Description
--prof PROF Directory containing profiling data.
--recipe RECIPE Directory containing recipe data.
--thread NTHREADS How many threads to be used on execution. Defaults to the amount of CPU cores.
--affinity MASK[@THREAD_INDICES] Use the affinity mask given by MASK on the threads given byTHREAD_INDICES.
MASK should be a little endian hexadecimal (0x..), binary (0b..), or decimal number.
If THREAD_INDICES isn't set all threads will use the given mask.
For more information on THREAD_INDICES use the softneuro help thread_indices command.
-p, --pass PASSWORD Password if the DNN file is encrypted.
-r, --routine ROUTINE[@LAYER_INDICES] Set the routine given by ROUTINE to the layers given by LAYER_INDICES.
If LAYER_INDICES isn't set, the routine will be set to all layers in the main network. The ROUTINE format can be checked with the softneuro help routine_desc command, and the LAYER_INDICES format can be checked with the softneuro help layer_indices command.
--estimate MODE Execution time estimation mode. Can be robust (default), min or ave.
-h, --help Shows the command help.

After tuning the vgg16_tuned.dnn file will be created.

$ softneuro tune vgg16.dnn vgg16_tuned.dnn
adding cpu routines...done.
profiling...100.0% [00:56] ETA[00:00]
0  ? (source)
1  ? (permute)   cpu (1)   155  cpu/naive  {}
2  ? (madd)      cpu (3)    29  cpu/avx    {"ops_in_task":16384}
3  ? (sink)

 #  NAME                           ROUTINE     TIME  DESC           PARAMS
 0  input_1 (source)
 1  block1_conv1 (conv2)           cpu (67)   1,239  cpu/owc64_avx  {"cache":8192,"task_ops":131072}

TOTAL  59,463

Tuning for OpenCL usage:

$ softneuro tune --routine opencl/fast@2..23 --routine cpu@1,24 vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]

Tuning for OpenCL(float16) usage:

$ softneuro tune --routine opencl:float16/fast@2..23 --routine cpu@1,24 vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]

Tuning for CUDA usage:

$ softneuro tune --routine cuda/fast@2..23 --routine cpu@1,24 vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]

Tuning for CUDA(float16) usage:

$ softneuro tune --routine cuda:float16/fast@2..23 --routine cpu@1,24 vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]

Tuning for 8bit quantization mode:

$ softneuro tune --routine cpu:qint8/fast vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]