Run and Profiling Commands¶

run¶

Runs inference using the DNN file model. It's possible to get inference results and execution times.

Usage

usage: softneuro run [-o ONPY]... [-p PASSWORD] [--recipe RECIPE][--batch BATCH]
                     [--ishape SHAPE] [--keep_img_ar PADDINGCOLOR]
                     [--img_resize_mode RESIZEMODE] [--thread NTHREADS] [--noboost]
                     [--affinity MASK[@THREAD_INDICES]]
                     [-r ROUTINE[@LAYER_INDICES]] [-R RPARAMS[@LAYER_INDICES]] [-nobufopt DEVICE]
                     [--lib LIB] [-l LNUM] [--detail] [--detail2] [--bylayer]
                     [-dump DUMPDIR] [-dump2 DUMPRID] [-t NTOPS] [-h]
                     DNN [INPUT [INPUT ...]]

Arguments

Argument	Description
DNN	DNN file for inference execution.
INPUT	Input for inference execution. Can be a numpy file or an image file. If not provided, input will be uniform random numbers from [-1, 1].

Flags

Flag	Description
-p PASSWORD --pass PASSWORD	Password to run an encrypted DNN file.
-o ONPY	File name to output the inference results as a numpy file.
--recipe RECIPE	Set dnn recipe file.
--batch BATCH	Input batch size.
--ishape SHAPE	Input shape. (Example: `1x224x224x3`).
--keep_img_ar PADDINGCOLOR	Keeps aspect ratio when resizing input image. The aspect ratio is not kept by default. Margin space is filled with the color specified by PADDINGCOLOR. PADDINGCOLOR can be specified by RGB value, for example, '0, 0, 0'.
--img_resize_mode RESIZEMODE	Specifies the resizing mode. 'bilinear' or 'nearest' can be specified. Default is 'bilinear'.
--thread NTHREADS	How many threads should be used for execution. Defaults to the number of CPU cores.
--noboost	Set threads as cond wait when they're waiting for a task. If this isn't set, threads will be set to busy wait.
--affinity MASK[@THREAD_INDICES]	Use the affinity mask given by `MASK` on the threads given by`THREAD_INDICES`. `MASK` should be a little endian hexadecimal (0x..), binary (0b..), or decimal number. If `THREAD_INDICES` isn't set all threads will use the given mask. For more information on `THREAD_INDICES` use the `softneuro help thread_indices` command.
-r, --routine ROUTINE[@LAYER_INDICES]	Set routines to be used. If not set, the usually best available routines will be chosen (e.g. CUDA if there's CUDA support). The default is cpu. If the model is tuned this setting is ignored. If `LAYER_INDICES` isn't set all layers in main net will be applied. For more information on `LAYER_INDICES` use the `softneuro help layer_indices` command.
-R RPARAMS[@LAYER_INDICES], --rparams RPARAMS[@LAYER_INDICES]	Set routine parameters to be used. If the model is tuned this setting is ignored. If `LAYER_INDICES` isn't set all layers in main net will be applied. For more information on `LAYER_INDICES` use the `softneuro help layer_indices` command.
--nobufopt DEVICE	Disable the buffer optimizer for routines that run on the given device.
--lib LIB	Set an OpenCL binary file when using online compilation.
-l, --loop LNUM	Run inference LNUM times for benchmarking.
--detail	Show detailed inference statistics.
--detail2	Show even more detailed inference statistics. If a layer is made up of other layers this shows the processing times of the internal layers as well.
--bylayer	Show detailed inference statistics by layer.
--dump DUMPDIR	Dump each layer output as a numpy file in the given folder.
--dump2 DUMPDIR	Dump each layer output and internal layer outputs if they exist in the given folder.
-t, --top NTOPS	Shows the top TOP scores and labels for image classification models.
-h, --help	Shows the command help.

Example

$ softneuro run densenet121.dnn --thread 8 --affinity 0xf0@0..3 --affinity 0x0f@4..7 --top 5 --loop 10 shovel.jpg

---------------------------------
Top 5 Labels
---------------------------------
#   SCORE  LABEL
1  0.9999  shovel
2  0.0001  hatchet
3  0.0000  broom
4  0.0000  swab
5  0.0000  spatula

---------------------------------
Statistics
---------------------------------
FUNCTION       AVE(us)  MIN(us)  MAX(us)  #RUN
Dnn_load()      43,070   43,070   43,070     1
Dnn_compile()   28,567   28,567   28,567     1
Dnn_forward()   39,877   39,751   39,983    10

Used memory: 88,403,968 Bytes

---------------------------------
Benchmark
---------------------------------
preprocess: 81 68 68 69 70 64 71 69 70 68
main: 39872 39698 39710 39679 39896 39884 39770 39801 39908 39824
TOTAL: 39955 39767 39778 39749 39968 39949 39842 39873 39981 39894

The inference time is given by Dnn_forward. AVE, MIN and MAX are, respectively, the average, minimum and maximum execution times for the number of runs shown under #RUN.

init¶

Initializes profiling data.

Usage

usage: softneuro init [--thread NTHREADS] [--affinity MASK[@THREAD_INDICES]]
                      [--pass PASSWORD] [--help]
                      PROF DNN

Arguments

Argument	Description
PROF	Directory where the profiling data will be initialized.
DNN	DNN file to be profiled.

Flags

Flag	Description
--thread TNUM	How many threads should be used for execution. Defaults to the number of CPU cores.
--affinity MASK[@THREAD_INDICES]	Use the affinity mask given by `MASK` on the threads given by`THREAD_INDICES`. `MASK` should be a little endian hexadecimal (0x..), binary (0b..), or decimal number. If `THREAD_INDICES` isn't set all threads will use the given mask. For more information on `THREAD_INDICES` use the `softneuro help thread_indices` command.
--pass PASSWORD	Password to profile an encrypted DNN file.
-h, --help	Shows the command help.

Example
The command creates the mobilnet_prof directory with profiling data.
※There's no terminal output

$ softneuro init mobilenet_prof mobilenet.dnn

add¶

Configure routines to layers present in the profiling data.

Usage

usage: softneuro add [--dnn DNN] [--pass PASSWORD] [--ref REF] [--ref-pass REF_PASSWORD] [--help]
                     PROF [ROUTINE[@LAYER_INDICES]]...

Arguments

Argument	Description
PROF	Directory containing profiling data.
ROUTINE[@LAYER_INDICES]	Set the routine given by `ROUTINE` to the layers given by `LAYER_INDICES`. If `LAYER_INDICES` isn't set, the routine will be set to all layers in the main network. The `ROUTINE` format can be checked with the `softneuro help routine_desc` command, and the `LAYER_INDICES` format can be checked with the `softneuro help layer_indices` command.

Flags

Flag	Description
--dnn DNN	A DNN file used to create profiling data.
-p, --pass PASSWORD	The password required to use the prof file.
--ref REF	the reference dnn file when profiling a secret dnn.
--ref-pass REF_PASSWORD	the password for REF.
-h, --help	Shows the command help.

Example
Set all main network layers to use the cpu:qint8 routine, if supported.

$ softneuro add mobilenet_prof cpu:qint8
adding routines...done.

$ softneuro status mobilenet_prof
   [preprocess]
#  NAME          ROUTINE  TIME  DESC  PARAMS
0  ? (source)
1  ? (madd)
2  ? (sink)
                             ?

    [main]
 #  NAME                                              ROUTINE        TIME  DESC  PARAMS
 0  input_1 (source)
 1  conv1 (conv2)                                     cpu:qint8 (9)
 2  conv_dw_1 (depthwise_conv2)                       cpu:qint8 (3)
 3  conv_pw_1 (conv2)                                 cpu:qint8 (9)
 4  conv_dw_2 (depthwise_conv2)                       cpu:qint8 (3)
 5  conv_pw_2 (conv2)                                 cpu:qint8 (9)
 6  conv_dw_3 (depthwise_conv2)                       cpu:qint8 (3)
 7  conv_pw_3 (conv2)                                 cpu:qint8 (9)
 8  conv_dw_4 (depthwise_conv2)                       cpu:qint8 (3)
 9  conv_pw_4 (conv2)                                 cpu:qint8 (9)
10  conv_dw_5 (depthwise_conv2)                       cpu:qint8 (3)
11  conv_pw_5 (conv2)                                 cpu:qint8 (9)
12  conv_dw_6 (depthwise_conv2)                       cpu:qint8 (3)
13  conv_pw_6 (conv2)                                 cpu:qint8 (9)
14  conv_dw_7 (depthwise_conv2)                       cpu:qint8 (3)
15  conv_pw_7 (conv2)                                 cpu:qint8 (9)
16  conv_dw_8 (depthwise_conv2)                       cpu:qint8 (3)
17  conv_pw_8 (conv2)                                 cpu:qint8 (9)
18  conv_dw_9 (depthwise_conv2)                       cpu:qint8 (3)
19  conv_pw_9 (conv2)                                 cpu:qint8 (9)
20  conv_dw_10 (depthwise_conv2)                      cpu:qint8 (3)
21  conv_pw_10 (conv2)                                cpu:qint8 (9)
22  conv_dw_11 (depthwise_conv2)                      cpu:qint8 (3)
23  conv_pw_11 (conv2)                                cpu:qint8 (9)
24  conv_dw_12 (depthwise_conv2)                      cpu:qint8 (3)
25  conv_pw_12 (conv2)                                cpu:qint8 (9)
26  conv_dw_13 (depthwise_conv2)                      cpu:qint8 (3)
27  conv_pw_13 (conv2)                                cpu:qint8 (9)
28  global_average_pooling2d_1 (global_average_pool)
29  reshape_1 (reshape)                               cpu:qint8 (1)
30  conv_preds (conv2)                                cpu:qint8 (9)
31  act_softmax (softmax)
32  reshape_2 (reshape)                               cpu:qint8 (1)
33  sink_0 (sink)
                                                                        ?

ROUTINES  cpu:qint8
TOTAL     ?

rm¶

Remove profiling information for the given routine.

Usage

usage: softneuro rm [--dnn DNN] [--pass PASSWORD] [--ref REF] [--ref-pass REF_PASSWORD] [--help] PROF [ROUTINE@IDS]...

Arguments

Argument	Description
PROF	Directory containing profiling data.
ROUTINE[@LAYER_INDICES]	Set the routine given by `ROUTINE` to the layers given by `LAYER_INDICES`. If `LAYER_INDICES` isn't set, the routine will be set to all layers in the main network. The `ROUTINE` format can be checked with the `softneuro help routine_desc` command, and the `LAYER_INDICES` format can be checked with the `softneuro help layer_indices` command.

Flags

Flag	Description
--dnn DNN	A DNN file used to create profiling data.
-p PASSWORD, --pass PASSWORD	The password required to use the prof file.
--ref REF	the reference dnn file when profiling a secret dnn.
--ref-pass REF_PASSWORD	the password for REF.
-h, --help	Shows the command help.

Example
Remove all routine settings from mobilenet_prof.

$ softneuro rm mobilenet_prof
removing routines...done.

$ softneuro status mobilenet_prof
   [preprocess]
#  NAME          ROUTINE  TIME  DESC  PARAMS
0  ? (source)
1  ? (madd)
2  ? (sink)
                             ?

    [main]
 #  NAME                                              ROUTINE  TIME  DESC  PARAMS
 0  input_1 (source)
 1  conv1 (conv2)
 2  conv_dw_1 (depthwise_conv2)
 3  conv_pw_1 (conv2)
 4  conv_dw_2 (depthwise_conv2)
 5  conv_pw_2 (conv2)
 6  conv_dw_3 (depthwise_conv2)
 7  conv_pw_3 (conv2)
 8  conv_dw_4 (depthwise_conv2)
 9  conv_pw_4 (conv2)
10  conv_dw_5 (depthwise_conv2)
11  conv_pw_5 (conv2)
12  conv_dw_6 (depthwise_conv2)
13  conv_pw_6 (conv2)
14  conv_dw_7 (depthwise_conv2)
15  conv_pw_7 (conv2)
16  conv_dw_8 (depthwise_conv2)
17  conv_pw_8 (conv2)
18  conv_dw_9 (depthwise_conv2)
19  conv_pw_9 (conv2)
20  conv_dw_10 (depthwise_conv2)
21  conv_pw_10 (conv2)
22  conv_dw_11 (depthwise_conv2)
23  conv_pw_11 (conv2)
24  conv_dw_12 (depthwise_conv2)
25  conv_pw_12 (conv2)
26  conv_dw_13 (depthwise_conv2)
27  conv_pw_13 (conv2)
28  global_average_pooling2d_1 (global_average_pool)
29  reshape_1 (reshape)
30  conv_preds (conv2)
31  act_softmax (softmax)
32  reshape_2 (reshape)
33  sink_0 (sink)
                                                                  ?

TOTAL  ?
                             ?

reset¶

Resets profiling data to defaults.

Usage

usage: softneuro reset [--dnn DNN] [--pass PASSWORD]  [--ref REF] [--ref-pass REF_PASSWORD] [--help]
                       PROF [ROUTINE@IDS]...

Arguments

Argument	Description
PROF	Directory containing profiling data.
ROUTINE[@LAYER_INDICES]	Set the routine given by `ROUTINE` to the layers given by `LAYER_INDICES`. If `LAYER_INDICES` isn't set, the routine will be set to all layers in the main network. The `ROUTINE` format can be checked with the `softneuro help routine_desc` command, and the `LAYER_INDICES` format can be checked with the `softneuro help layer_indices` command.

Flags

Flag	Description
--dnn DNN	A DNN file used to create profiling data.
-p PASSWORD, --pass PASSWORD	The password required to use the prof file.
--ref REF	the reference dnn file when profiling a secret dnn.
--ref-pass REF_PASSWORD	the password for REF.
-h, --help	Shows the command help.

Example
Reset the mobilenet_prof profiling data.

$ softneuro reset mobilenet_prof
resetting routines...done.
$ softneuro status mobilenet_prof
   [preprocess]
#  NAME          ROUTINE  TIME  DESC  PARAMS
0  ? (source)
1  ? (madd)      cpu (3)
2  ? (sink)
                             ?

    [main]
 #  NAME                                              ROUTINE   TIME  DESC  PARAMS
 0  input_1 (source)
 1  conv1 (conv2)                                     cpu (15)
 2  conv_dw_1 (depthwise_conv2)                       cpu (3)
 3  conv_pw_1 (conv2)                                 cpu (47)
 4  conv_dw_2 (depthwise_conv2)                       cpu (3)
 5  conv_pw_2 (conv2)                                 cpu (47)
 6  conv_dw_3 (depthwise_conv2)                       cpu (3)
 7  conv_pw_3 (conv2)                                 cpu (47)
 8  conv_dw_4 (depthwise_conv2)                       cpu (3)
 9  conv_pw_4 (conv2)                                 cpu (47)
10  conv_dw_5 (depthwise_conv2)                       cpu (3)
11  conv_pw_5 (conv2)                                 cpu (47)
12  conv_dw_6 (depthwise_conv2)                       cpu (3)
13  conv_pw_6 (conv2)                                 cpu (47)
14  conv_dw_7 (depthwise_conv2)                       cpu (3)
15  conv_pw_7 (conv2)                                 cpu (47)
16  conv_dw_8 (depthwise_conv2)                       cpu (3)
17  conv_pw_8 (conv2)                                 cpu (47)
18  conv_dw_9 (depthwise_conv2)                       cpu (3)
19  conv_pw_9 (conv2)                                 cpu (47)
20  conv_dw_10 (depthwise_conv2)                      cpu (3)
21  conv_pw_10 (conv2)                                cpu (47)
22  conv_dw_11 (depthwise_conv2)                      cpu (3)
23  conv_pw_11 (conv2)                                cpu (47)
24  conv_dw_12 (depthwise_conv2)                      cpu (3)
25  conv_pw_12 (conv2)                                cpu (47)
26  conv_dw_13 (depthwise_conv2)                      cpu (3)
27  conv_pw_13 (conv2)                                cpu (47)
28  global_average_pooling2d_1 (global_average_pool)  cpu (1)
29  reshape_1 (reshape)                               cpu (1)
30  conv_preds (conv2)                                cpu (47)
31  act_softmax (softmax)                             cpu (1)
32  reshape_2 (reshape)                               cpu (1)
33  sink_0 (sink)
                                                                   ?

ROUTINES  cpu
TOTAL     ?

status¶

Show the routines, parameters and measured profiling times for each layer.

Usage

usage: softneuro status [--dnn DNN] [--pass PASSWORD] [--ref REF] [--ref-pass REF_PASSWORD] [--at INDEX]
                        [--estimate MODE] [--csv] [--help]
                        PROF

Arguments

Argument	Description
PROF	Directory containing profiling data.

Flags

Flag	Description
--dnn DNN	A DNN file used to create profiling data.
-p, --pass PASSWORD	The password required to use the encrypted prof file.
--ref REF	the reference dnn file when profiling a secret dnn.
--ref-pass REF_PASSWORD	the password for REF.
-@, --at INDEX	Show only the information for the layer at the given index.
--estimate MODE	Execution time estimation mode. Can be `robust` (default), `min` or `ave`.
--csv	Output information in CSV format.
--help	Show the command help.

Example
The example information is for after running the profile command to measure execution times.

$ softneuro status mobilenet_prof
   [preprocess]
#  NAME          ROUTINE  TIME  DESC     PARAMS
0  ? (source)
1  ? (madd)      cpu (3)    28  cpu/avx  {"ops_in_task":16384}
2  ? (sink)
                            28

    [main]
 #  NAME                                              ROUTINE    TIME  DESC            PARAMS
 0  input_1 (source)
 1  conv1 (conv2)                                     cpu (15)    213  cpu/owc64_avx   {"cache":8192,"task_ops":131072}
 2  conv_dw_1 (depthwise_conv2)                       cpu (3)     110  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
 3  conv_pw_1 (conv2)                                 cpu (47)    195  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":144}
 4  conv_dw_2 (depthwise_conv2)                       cpu (3)      60  cpu/owc32_avx   {"cache":8192,"task_ops":32768}
 5  conv_pw_2 (conv2)                                 cpu (47)    177  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":72}
 6  conv_dw_3 (depthwise_conv2)                       cpu (3)     113  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
 7  conv_pw_3 (conv2)                                 cpu (47)    328  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":36}
 8  conv_dw_4 (depthwise_conv2)                       cpu (3)      40  cpu/owc32_avx   {"cache":8192,"task_ops":32768}
 9  conv_pw_4 (conv2)                                 cpu (47)    167  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":96}
10  conv_dw_5 (depthwise_conv2)                       cpu (3)      68  cpu/owc32_avx   {"cache":8192,"task_ops":131072}
11  conv_pw_5 (conv2)                                 cpu (47)    320  cpu/m1x1l_avx   {"cache":1048576,"oxynum_in_task":96}
12  conv_dw_6 (depthwise_conv2)                       cpu (3)      23  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
13  conv_pw_6 (conv2)                                 cpu (47)    164  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
14  conv_dw_7 (depthwise_conv2)                       cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
15  conv_pw_7 (conv2)                                 cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
16  conv_dw_8 (depthwise_conv2)                       cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
17  conv_pw_8 (conv2)                                 cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
18  conv_dw_9 (depthwise_conv2)                       cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
19  conv_pw_9 (conv2)                                 cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
20  conv_dw_10 (depthwise_conv2)                      cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
21  conv_pw_10 (conv2)                                cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
22  conv_dw_11 (depthwise_conv2)                      cpu (3)      34  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
23  conv_pw_11 (conv2)                                cpu (47)    313  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
24  conv_dw_12 (depthwise_conv2)                      cpu (3)      14  cpu/owc32_avx   {"cache":8192,"task_ops":65536}
25  conv_pw_12 (conv2)                                cpu (47)    169  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":16}
26  conv_dw_13 (depthwise_conv2)                      cpu (3)      19  cpu/owc32_avx   {"cache":8192,"task_ops":32768}
27  conv_pw_13 (conv2)                                cpu (47)    336  cpu/m1x1l2_avx  {"cache":1048576,"oxynum_in_task":8}
28  global_average_pooling2d_1 (global_average_pool)  cpu (1)      13  cpu/naive       {}
29  reshape_1 (reshape)                               cpu (1)       0  cpu             {}
30  conv_preds (conv2)                                cpu (47)     23  cpu/owc64_avx   {"cache":8192,"task_ops":32768}
31  act_softmax (softmax)                             cpu (1)      21  cpu/naive       {}
32  reshape_2 (reshape)                               cpu (1)       0  cpu             {}
33  sink_0 (sink)
                                                                4,308

ROUTINES  cpu
TOTAL     4,336

profile¶

Run profiling based on profiling data.

Usage

usage: softneuro profile [--dnn DNN] [--pass PASSWORD] [--help] PROF

Arguments

Argument	Description
PROF	Directory containing profiling data.

Flags

Flag	Description
--dnn DNN	A DNN file used to create profiling data.
-p, --pass PASSWORD	The password required to use the encrypted prof file.
--help	Shows the command help.

Example
After using the init command to generate profiling data, the profile command measures execution times and saves the profiling information into the profiling data directory.

$ softneuro prof mobilenet_prof
profiling...100.0% [00:01]

tune¶

Tune a DNN file for faster inference times. If profiling data isn't provided, the command automatically runs profiling.

Usage

usage: softneuro tune [--prof PROF] [--recipe RECIPE] [--thread NTHREADS]
                      [--affinity MASK[@THREAD_INDICES]] [--pass PASSWORD]
                      [--routine ROUTINE[@IDS]]... [--estimate MODE] [--help]
                      INPUT OUTPUT

Arguments

Argument	Description
INPUT	DNN file to be tuned.
OUTPUT	Output tuned DNN file.

Flags

Flag	Description
--prof PROF	Directory containing profiling data.
--recipe RECIPE	Directory containing recipe data.
--thread NTHREADS	How many threads to be used on execution. Defaults to the amount of CPU cores.
--affinity MASK[@THREAD_INDICES]	Use the affinity mask given by `MASK` on the threads given by`THREAD_INDICES`. `MASK` should be a little endian hexadecimal (0x..), binary (0b..), or decimal number. If `THREAD_INDICES` isn't set all threads will use the given mask. For more information on `THREAD_INDICES` use the `softneuro help thread_indices` command.
-p, --pass PASSWORD	Password if the DNN file is encrypted.
-r, --routine ROUTINE[@LAYER_INDICES]	Set the routine given by `ROUTINE` to the layers given by `LAYER_INDICES`. If `LAYER_INDICES` isn't set, the routine will be set to all layers in the main network. The `ROUTINE` format can be checked with the `softneuro help routine_desc` command, and the `LAYER_INDICES` format can be checked with the `softneuro help layer_indices` command.
--estimate MODE	Execution time estimation mode. Can be `robust` (default), `min` or `ave`.
-h, --help	Shows the command help.

Example
After tuning the vgg16_tuned.dnn file will be created.

$ softneuro tune vgg16.dnn vgg16_tuned.dnn
adding cpu routines...done.
profiling...100.0% [00:56] ETA[00:00]
   [preprocess]
#  NAME          ROUTINE  TIME  DESC       PARAMS
0  ? (source)
1  ? (permute)   cpu (1)   155  cpu/naive  {}
2  ? (madd)      cpu (3)    29  cpu/avx    {"ops_in_task":16384}
3  ? (sink)
                           184

    [main]
 #  NAME                           ROUTINE     TIME  DESC           PARAMS
 0  input_1 (source)
 1  block1_conv1 (conv2)           cpu (67)   1,239  cpu/owc64_avx  {"cache":8192,"task_ops":131072}
  :

TOTAL  59,463

Tuning for OpenCL usage:

$ softneuro tune --routine opencl/fast@2..23 --routine cpu@1,24 vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]

Tuning for OpenCL(float16) usage:

$ softneuro tune --routine opencl:float16/fast@2..23 --routine cpu@1,24 vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]

Tuning for CUDA usage:

$ softneuro tune --routine cuda/fast@2..23 --routine cpu@1,24 vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]

Tuning for CUDA(float16) usage:

$ softneuro tune --routine cuda:float16/fast@2..23 --routine cpu@1,24 vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]

Tuning for 8bit quantization mode:

$ softneuro tune --routine cpu:qint8/fast vgg16.dnn vgg16_tuned.dnn
profiling..100.0% [01:23] ETR[00:00]