In order to post process the output of the hardware run, scripts have been provided under the Segment/workspace/scripts/postprocess folder.
The following steps can be performed to complete this process:
Copy the results folde from the target hardware back to the host machine into the Segment/workspace/scripts/postprocess folder.
Make sure your $CITYSCAPES_DATASET variable is exported properly to the location of dataset. If you have not done this, a default location will be used which will cause the script to fail unless your locations matches what is used as the default in the script.
Next you need to prepare the validation images that will be used as a ground truth for comparison against the model output. This can be done by running the cls34_to_cls19.py script by entering python -cls34_to_cls19.py. This step only needs to be performed once for the ground truth images. If you already completed this as part of section 4.2, you can skip this step. Note that these will be stored in a folder called test_gtFine_cls19 where your $CITYSCAPES_DATASET is located.
Now run the eval_segmentation.sh script by entering ./eval_segmentation.sh.
The output of this step should be a list of IOUs starting with the mIOU for all the classes (this is the number to compare to the decent_q quantized model mIOU). The other numbers are per clas IOU numbers for the validation dataset. I already completed this step for the pre-trained models and you can refer back to section "3.1.0 About the Pre-Trained Models" to see the results.
Looking back, we've covered a lot of ground, including walking through the process of preparing, training, testing, and deploying 5 different segmentation models. The goal of this tutorial was not to show a perfectly optimized solution, but rather to blaze a trail so you experts and explorers can streamline your own segmentation model development and rapidly deploy those models on a Xilinx SoC/MPSoC.
The beauty of this solution is that there is a full portfolio of Zynq-7000 and Zynq Ultrascale+ devices (qualified for commercial, industrial, automotive, and aerospace and defense end markets) and various DPU configurations that allow you to scale for low power applications that that require < 3W as well as dial up the performance for higher end products (where you may need hundreds to thousands of FPS), such as deployments on PCIe accelerator class products such as Alveo.
All of this is possible while using the same design approaches and tool flow, without re-inventing algorithms for low end vs. high end products. You can even make trade-offs for DPU size vs. traditional pre/post-processing hardware acceleration (e.g. optical flow, stereo block matching, scaling, de-interlacing, FFTs, or even custom Image Sensor Pipelines). The number of potential implementations are virtually endless, so you can truly build an optimal solution for your application that maximizes your differentiation.
Jon Cory is located near Detroit, Michigan and serves as an Automotive focused Machine Learning Specialist Field Applications Engineer (FAE) for AMD. Jon’s key roles include introducing AMD ML solutions, training customers on the ML tool flow, and assisting with deployment and optimization of ML algorithms in AMD devices. Previously, Jon spent two years as an Embedded Vision Specialist (FAE) with a focus on handcrafted feature algorithms, Vivado HLS, and ML algorithms, and six years prior as an AMD Generalist FAE in Michigan/Northern Ohio. Jon is happily married for four years to Monica Cory and enjoys a wide variety of interests including music, running, traveling, and losing at online video games.