Scaling up the software system on service robots significantly increases the maintenance burden of developers and the risk of resource contention of the computer embedded with robots. As a result, developers spend much time on configuring, deploying, and monitoring the robot software system; robots utilize significant computer resources when all software processes are running. We propose Rorg, a Linux container-based scheme to manage, schedule, and monitor software components on service robots. Although Linux container is already widely-used in cloud environments, this technique is challenging to efficiently adopt in service robot systems due to the unique characteristics of service robots -- multi-purpose systems with resource limitations and high performance requirements. To pave the way of Linux containers on service robots in an efficient manner, we propose a programmable container management interface and a resource time-sharing mechanism incorporated with the robot operating system (ROS). Rorg allows the developers to pack software into self-contained images and runs them in isolated environments using Linux containers; it also allows the robot to turn on and off software components on demand to avoid resource contention. We evaluate Rorg with a long-term autonomous tour guide robot: It manages 41 software components on the robot and relieved our maintenance burden, and it also reduces CPU by 45.5% and memory usage by 16.5% on average.
Service robots perform flawless demos while their developers are keeping a close eye, but they often fail when working autonomously, especially in a long-term deployment. To study failure modes and human-robot interaction patterns in long-term deployments, we built TritonBot, a long-term autonomy robot working as a building receptionist and a tour guide. It recognizes people’s face, talks to them, and guides people to the labs and facilities in an office building. This paper presents the design of TritonBot and the lessons we learned from the first-month deployment with respect to technical and human-robot interaction aspects. TritonBot and its variant BoxBot have worked for 108.7 hours, actively interacted with people for 22.1 hours, greeted people 2950 times, guided 150 tours, and traveled 9.9 kilometers. We share the components of TritonBot using an open licence to help the community to replicate the TritonBot platform and inspire long-term autonomy and human-robot interaction research.
Algorithms such as Bag of Words and Simhash have been widely used in image recognition. To achieve better performance as well as energy-efficiency, a hardware implementation of these two algorithms is proposed in this paper. To the best of our knowledge, it is the first time that these algorithms have been implemented on hardware for image recognition purpose. The proposed implementation is able to generate a fingerprint of an image and find the closest match in the database accurately. It is implemented on Xilinx’s Virtex-6 SX475T FPGA. Tradeoffs between high performance and low hardware overhead are obtained through proper parallelization. The experimental result shows that the proposed implementation can process 1,018 images per second, approximately 17.8x faster than software on Intel's 12-thread Xeon X5650 processor. On the other hand, the power consumption is 0.35x compared to software-based implementation. Thus, the overall advantage in energy-efficiency is as much as 46x. The proposed architecture is scalable, and is able to meet various requirements of image recognition.
Most of the implementations of boundary scan chains are of fixed length, typically hundreds or thousands. Because the whole chain is scanned every time, many clock cycles are wasted when only a small part of it is concerned. In this paper, a novel structure of configurable boundary scan chain is proposed. Its length and content can be reconfigured without interrupting the chip’s functionality. Experimental result shows that the maximum frequency can be as high as 510.4MHz for a full-configurable chain with 512 cells, under 32 nm process, which is 15.7x better than the intuitive method. The proposed structure has been applied to a processor prototype design, and is expected to meet requirements of different applications.
This paper proposes a high performance hardware architecture of Speeded Up Robust Features (SURF) algorithm based on OpenSURF. In order to achieve high processing frame rate, the hardware architecture is designed with several characteristics. Firstly, a sliding window method is proposed to extract feature points in parallel at selected scale levels. As a result, the time cost in feature extraction can be greatly reduced. Secondly, data reuse strategy is proposed in orientation generation and descriptor generation to reduce the memory access times. In this way, 3.87x and 2.25X speedup are achieved respectively. Thirdly, the integral image is segmented to buffer in different memory blocks in order to support multiple data accessing in one clock cycle, which will further reduce the whole calculating time of our implementation. The hardware architecture is implemented on an XC6VSX475T FPGA with 156 MHz and its maximal frame rate for VGA format image can reach 356 frames per second (fps), which is 6.25 times frame rate of OpenSURF running on a server with a Xeon 5650 processor, and 6 times the reported frame rate of the recent implementation on three Vritex4 FPGAs.
In this paper, a high-performance match engine for content-based image retrieval is proposed. Highly customized floating-point(FP) units are designed, to provide the dynamic range and precision of standard FP units, but with considerably less area than standard FP units. Match calculation arrays with various architectures and scales are designed and evaluated. An CBIR system is built on a 12-FPGA cluster. Inter-FPGA connections are based on standard 10-Gigabyte Ethernet. The whole FPGA cluster can compare a query image against 150 million library images within 10 seconds, basing on detailed local features. Compared with the Intel Xeon 5650 server based solution, our implementation is 11.35 times faster and 34.81 times more power efficient.