Subject Guides: Robots @ SIT: Interacting with Human by vision and voice

Introduction

A smart robot shall be able to know if a person is in front of it and communicate with the person using voice (talk). This involves speech to text and text to speech function as well as the facial recognition function.

In this project, we will program Yanshee using Python to:

- Recognise a person

- Complete tasks based on the voice instruction using Yanshee API

- Reading temperature sensors using Yanshee API

Required Knowledge

Please ensure that you have gone through the below resources before starting this project:

1. Yanshee introduction video

2. Yanshee's Guide - Basic Python

3. Yanshee's API Document

Introduction to YanAPI

YanAPI is an API for Python programming language. YanAPI provides the ability to use Python to obtain robot status information as well as design and control Yanshee robot. The API has pre-programmed the complicated AI functions and allows the programmer to call those functions by a single line of code.

Task 1: Recognising Faces and greeting

Step 1: Switch on Yanshee and connect to Yanshee using VNC viewer.

Step 2: Create your python file under and name it as <Your File Name>.py

Step 3: Initialise YanAPI by using the code below:

import YanAPI
import time

ip_addr = "127.0.0.1" # please change to your yanshee robot IP
YanAPI.yan_api_init(ip_addr)

The import command is to tell Python that this application will re-use the functions in the imported modules (YanAPI and time module). The ip_addr parameter is the IP address of the robot that you are going to control. In most of the case, we set it to 127.0.0.1 for the curret robot itself. If you which to control more than one robot, you may need the IP address of the other robots. After that Call the YanAPI.yan_api_init function to initialise the connection of the robot.

Step 4: Recognise the face by adding the code below

fr_res = YanAPI.sync_do_face_recognition("recognition")
if name_val != "" and  name_val != "none":
    print("\nDetected： ")
    print(name_val)

YanAPI makes facial recognition super easy by just calling 1 function. If the face has been added to the database, the sync_do_face_recognition returns the result of the recognition.

Step 5: Speak the greetings by calling TTS API

tts_res = YanAPI.start_voice_tts("Hi "+name_val+", what can I do for you?",False)

Yeah! It's done! Without writing hundreds lines of codes, you can use YanAPI to do Facial Recognition and Text To Speech with these few lines of codes.

Task 2: Two-way communication by Listening and Speaking

In this task, we program Yanshee to listen to the human voice command to add new face to the database, and get the name of the person by asking a question, finally, take a photo of the person in front and execute the adding facial function based on the information it gets.

Step 1: Create a function to capture image and add to the facial recognition database


def input_face_sample(name):
    
    tts_res = YanAPI.start_voice_tts("Taking photo, cheese",False)
    
    #take a photo
    res = YanAPI.take_vision_photo()
    print(res)
    if(res["code"] == 0):
        #retrieve photo image
        path = "/tmp/"
        YanAPI.get_vision_photo(res["data"]["name"], path)
        photo = path + res["data"]["name"]
        photo_name = res["data"]["name"] 
        #upload to FR database
        YanAPI.upload_vision_photo_sample(photo)
        #put link image with name
        YanAPI.set_vision_tag([photo_name],name)
    else:
        print(res["msg"])

Step 2: Listen to the voice and translate it into Text.

listen_res = YanAPI.sync_do_voice_asr()

Step 3: To check if user said the "new face" comment

if len(listen_res["data"]) > 0:# user said something
        question=listen_res["data"]['intent']['answer']['question']['question']
        if question.lower().strip()=="new face": #to check if the user said "new face"

Step 4: Response to the user and ask user the name of the face.


tts_res = YanAPI.start_voice_tts("Sure, what is the name of the new face?",False)
time.sleep(2) #sleep 2 seconds to ensure the voice will not be interapt by the code below
listen_res = YanAPI.sync_do_voice_asr() # listen again to get the name.
name=listen_res["data"]['intent']['answer']['question']['question'];
tts_res = YanAPI.start_voice_tts("Thank you. "+name+", please face my camera.",False)
time.sleep(5);

Step 5: Once the name is captured, detect the number of face in camera to ensure only 1 person is captured and call the function created by Step 1


res=YanAPI.sync_do_face_recognition('quantity')
face_quantity=res['data']['quantity'];
                
if face_quantity==1:
    input_face_sample(name) #call the function created by step 1
else:
    tts_res = YanAPI.start_voice_tts("Sorry, only 1 face is allowed. Adding new face cancelled",False)

Download

Yanshee Voice and FR python source code
Python source code for Yanshee voice and FR project