Transcript: Justine Cassell on Non-Verbal Communication | Feb 04, 2001

A slate with two Doric columns reads "Justine Cassell. Media Laboratoty M.I.T. ‘Why do we have a body anyway?’"

Justine stands behind a wooden lectern on a stage. She is in her mid-thirties, with very long frizzy brown hair in a half-do. She wears glasses, a gray top, a gray, beige and black blazer and a necklace.

A caption reads "Justine Cassell. Massachusetts Institute of Technology."

The caption changes to "Why do we have a body anyway?"

She says WHAT I'M
GOING TO DO IN TODAY'S TALK IS
TELL YOU SOMETHING ABOUT ONE OF
THE TRENDS IN CONTEMPORARY HCI,
HUMAN-COMPUTER INTERACTION,
AND IN FACT THEORIES OF
COMPUTATIONAL
SYSTEMS IN GENERAL.
I'M GOING TO TALK ABOUT WHY
EVEN THOUGH THAT'S A POPULAR
THEORY OR A POPULAR APPROACH
TODAY, I THINK IT'S PROBLEMATIC.
I'M GOING TO PRESENT SOME
DATA FROM HUMAN-HUMAN
INTERACTION TO TALK
ABOUT WHY THIS THEORY OF
HUMAN-COMPUTER INTERACTION
MIGHT POSE PROBLEMS, AND
THEN INSTEAD, I'M GOING TO
OFFER ANOTHER APPROACH TO
HUMAN-COMPUTER INTERACTION BASED
ON HUMAN-HUMAN INTERACTION.
I'LL WALK YOU THROUGH THE
PARALLEL BETWEEN HUMAN-HUMAN
AND HUMAN-COMPUTER AND
THEN I'LL SHOW YOU SOME
IMPLEMENTED SYSTEMS THAT ARE
BUILT ON THE BASIS OF THAT
THEORY OF HUMAN-COMPUTER
INTERACTION THAT IS MY OWN
THEORY THAT IS MINE.
SO WE HEAR A LOT THESE DAYS
IN NSF CALLS FOR PROPOSALS,
AS WELL AS IN ALL KINDS
OF OTHER PLACES ABOUT THE
IMPORTANCE OF SMART
ENVIRONMENTS.
SO IT'S IMPORTANT THESE DAYS
TO EMBED COMPUTATION IN
EVERY ASPECT OF OUR LIVES.
NOW I WANTED TO GIVE YOU
A REALISTIC SENSE OF HOW
PEOPLE APPROACH SMART
ENVIRONMENTS AND SO I'VE
TAKEN QUOTES FROM A NUMBER
OF DOCUMENTS THAT DISCUSS
SMART ENVIRONMENTS
OR SMART SPACES.
AND IN THE MEDIA LAB AT MIT,
THERE'S SOMETIMES A FEELING
OF A LITTLE BIT OF
COMPETITION WITH THE AI LAB,
ALSO AT MIT, AND SO IT
TURNS OUT THAT THERE'S WORK
BEING DONE ON SMART SPACES
IN THE AI LAB AND ON SMART
ROOMS IN THE MEDIA LAB, AND
I DIDN'T WANT ANYBODY TO
FEEL LEFT OUT, SO I'M GOING
TO CRITICIZE BOTH PROJECTS
BY TAKING QUOTES FROM BOTH
DOCUMENTS, AI LAB AND MEDIA
LAB, AND I'M NOT EVEN GOING
TO TELL YOU WHICH IS WHICH.
SO WE READ THAT WE WOULD
LIKE NOW TO CREATE SPACES
IN WHICH COMPUTATION IS
SEAMLESSLY USED TO ENHANCE
EVERYDAY ACTIVITIES.
WE WANT TO ALLOW HOMEOWNERS
TO SPEAK DIRECTLY TO THE
HOUSE AND ASK
IT TO DO THINGS.
WE WANT TO INCORPORATE
COMPUTERS IN THE REAL WORLD
AND ALLOW PEOPLE TO INTERACT
WITH THEM THE WAY THEY DO
WITH OTHER PEOPLE... AND THE
ITALICS ARE MINE, AND I'M
POINTING OUT HERE THAT
WHAT'S BEING SAID IS THAT WE
SHOULD BE ABLE TO INTERACT
WITH A SPACE IN THE WAY WE
INTERACT WITH ANOTHER HUMAN.
TO CONTINUE MORE OF WHAT WE
READ IS THAT THIS IS GOING
TO CONSTITUTE AN EMBEDDED
SYSTEM OF COMPUTERS,
CAMERAS, MICROPHONES, VOICE
AND SPEECH RECOGNITION AND
MACHINE VISION SOFTWARE.
AND THE OUTCOME WILL BE
ROOMS THAT RESPOND TO VERBAL
REQUESTS FOR DATA PROJECTED
ON A WALL, AND ONCE AGAIN WE
WANT TO TRACK PEOPLE AND
RECOGNIZE ACTIVITIES IN
THEIR NATURAL CONTEXT.
SO A FOCUS ONCE AGAIN ON
HOW PEOPLE INTERACT IN
A NATURAL CONTEXT.
BUT AS THE GREAT AUTHOR AND
WISE PERSON, HARRY POTTER,
TELLS US, 'NEVER TRUST
ANYTHING THAT CAN THINK FOR
ITSELF IF YOU CAN'T SEE
WHERE IT KEEPS ITS BRAIN'.
AND WHAT I'M GOING TO SAY IS
THAT THE PROBLEMS WITH THESE
SMART SPACES IS THAT THEY'RE
INTELLIGENT BUT WE DON'T
KNOW WHERE THAT INTELLIGENCE
OR AGENCY OR MIND IS
LOCATED, AND THAT CAN MAKE
IT HARD FOR A HUMAN TO
ATTRIBUTE A MIND, TO HAVE A
THEORY OF MIND, TO ATTRIBUTE
REPRESENTATION TO THE SYSTEM
AND TO KNOW WHERE THE
INTELLIGENCE IS LOCATED.
AND I CAN ACTUALLY
GIVE YOU TWO EXAMPLES.
I WENT TO GIVE A TALK ABOUT
THIS IN FRANCE AND, FUNNILY
ENOUGH, ONE OF THE MAJOR
FRENCH RESEARCHERS WHO WORKS
ON SMART SPACES WAS
ESCORTING ME FROM MY HOTEL
TO THE CONFERENCE ROOM.
AND WE CAME TO THE
CONFERENCE BUILDING AND
THERE WAS THIS
BIG GLASS DOOR.
AND WE WALKED UP TO THE
GLASS DOOR AND IT DIDN'T OPEN.
AND SO MY HOST BEGAN WAVING
HIS HANDS FRANTICALLY AT
THIS GLASS DOOR.
WHY?
BECAUSE HE DIDN'T KNOW WHERE
ITS EYES WERE, SO HE DIDN'T
KNOW WHERE TO GESTURE TO THE
THING TO MAKE IT RESPOND TO HIM.
SO HERE'S AN INTELLIGENT
DOOR, BUT IT'S INTELLIGENT
IN AN INVISIBLE WAY.
AND LIKEWISE, JUST A COUPLE
OF DAYS AGO, THE MEDIA LAB
HAS NOW REPLACED ALL THE
FAUCETS IN THE BATHROOMS
WITH SMART FAUCETS.
NATURALLY.
WE HAD TO HAVE
SMART FAUCETS.
SO THE OTHER DAY I WAS IN
THE LADIES ROOM AND ONCE
AGAIN, THERE WAS ONE OF MY
ESTEEMED COLLEAGUES WAVING
HER HANDS FRANTICALLY AT THE
FAUCET TO MAKE IT TURN ON
BECAUSE IT HAS INTELLIGENCE
SOMEWHERE BUT WE DON'T KNOW
WHERE THAT INTELLIGENCE
IS LOCATED.
AND, IN FACT, THE WAY WE'RE
KIND OF BORN TO INTERACT AND
THE WAY WE DO AS CHILDREN
AND ADULTS INTERACT IN A
NATURAL CONTEXT IS TO GAZE
AT, TO USE OUR EYES, TO
ORIENT TO SOMETHING OR
SOMEBODY ELSE, TO GESTURE TO...
AND I'LL PROVIDE ABUNDANT
EXAMPLES OF THAT TODAY.
WE LIVE IN A REALITY THAT IS
SHARED WITH ARTIFACTS THAT
HAVE A PHYSICAL EXISTENCE.
WE HAVE A HARD TIME
INTERACTING WITH THE INVISIBLE.
OUR BODIES ARE LOCATED
IN SPACE, AND I BELIEVE
THEREFORE THAT INTERACTION
SHOULD BE LOCALIZED IN SPACE.
OUR HAND GESTURES
SPATIALIZE.
WE SAY, OKAY, IN KEY WORK ON
MACHINE VISION, AND I CAN
INDICATE SVEN.
NOW HE IS OBVIOUSLY NOT KEY
WORK ON MACHINE VISION IN
THAT CHAIR, BUT I CAN
SPATIALIZE THAT ASPECT OF
RESEARCH BY LOCALIZING
IT IN HIS CHAIR.
WE EXPRESS TOO.
OUR FACES ARE VERY
EXPRESSIVE OF EMOTION, BUT
EVEN MORE OF THE KIND OF
TURN-TAKING AND PROCESS
OF INTERACTION.
AGENCY OF THE KIND THAT
WE RESPOND TO SHOULD BE
VISIBLE IN ORDER FOR US TO
BE ABLE TO RESPOND TO IT.
AND OUR EYES DO THAT KIND
OF WORK BY ORIENTING.
WHEN I MAKE GESTURES AND
I LOOK AT MY GESTURES,
YOU WILL ALSO ORIENT YOUR
EYES TOWARDS MY HANDS.
AND WE DEMONSTRATE
TRUSTWORTHINESS WITH OUR
EYES, IT TURNS OUT, BY
LOOKING AT OTHER PEOPLE'S
EYES AND THEN LOOKING DOWN,
BY LOOKING AT THE TASK THAT
WE'RE ATTENDING TO.
AND THESE PRINCIPLES IN
HUMAN-HUMAN INTERACTION...
NOT PRINCIPLES, BUT THESE
ASPECTS OF HUMAN-HUMAN
INTERACTION ARE WHAT SUPPORT
COLLABORATION AMONG HUMANS
IN A VERY MINUTE MOMENT TO
MOMENT WAY AND SUPPORT TRUST.
NOW, YOU MAY SAY BUT WE
CERTAINLY INTERACT WITH
THE INVISIBLE.
FOR EXAMPLE, LOOK AT ALL
THE PEOPLE WALKING DOWN THE
STREET WITH CELLPHONES WITH
THOSE LITTLE THINGS IN THEIR
EARS AND THEY'RE WALKING
DOWN THE STREET HAVING
A CONVERSATION.
BUT AS I'M SURE YOU'VE
NOTICED QUITE STRIKINGLY,
THEY'RE GESTURING WITH
THAT CONVERSATION.
AND I WAS AMAZED TO FIND OUT...
LAST WEEK I GAVE A LECTURE
IN COSTA RICA AND IT WAS
SIMULTANEOUSLY INTERPRETED
AND THE INTERPRETER'S BOOTH
WAS AT THE BACK OF THE ROOM,
AND THE INTERPRETER GESTURED
WHILE SHE WAS INTERPRETING
MY TALK.
NOW NO ONE COULD SEE HER.
SHE WAS BEHIND ALL
THE SPECTATORS.
SO WE ARE CAPABLE OF HAVING
A CONVERSATION WHERE THERE IS
NO PERSON IN FRONT OF
US, BUT WE SPATIALIZE THAT
CONVERSATION TO A
PARTICULAR POINT IN SPACE.
WE LOCALIZE IT.
OKAY.
I'VE PUT UP HERE SOMETHING
THAT YOU MIGHT SAY IS A
RESPONSE TO THIS.
SO WHAT AM I MAKING SUCH A
BIG DEAL OUT OF THIS FOR?
WHY IS THIS THE KIND OF
THING THAT YOU TALK ABOUT IN
A COMPUTER SCIENCE
COLLOQUIUM?
ISN'T IT ALREADY PRODUCT?
HASN'T IT BEEN REDUCED TO
PRACTICE QUITE SIMPLY?
AND HERE'S AN EXAMPLE OF
WHAT'S BEEN TOUTED AS THE
HIGHEST ACHIEVEMENT OF
ARTIFICIAL INTELLIGENCE.

She points at a slide on the projection screen. A picture of a woman appears with the title "Ananova (2000)" along with some text.

Justine says THAT'S WHAT IT SAYS ON THE WEB PAGE.
YOU'LL BE GLAD TO KNOW
YOU CAN ALL GO HOME NOW.
THIS IS ANANOVA.
SHE WAS CREATED BY THE BBC
TO BE A NEWS READER AND
THERE'S A TECHNICAL... I
WENT LOOKING FOR TECHNICAL
INFORMATION ON ANANOVA AND I
HAVE TO TELL YOU THAT IN THE
TECHNICAL INFORMATION PART
OF THE WEB PAGES YOU'RE TOLD
THAT THE REASON SHE HAS
GREEN HAIR IS THAT IT
MATCHES HER SHIRT, AND THE
ONLY KIND OF ARCHITECTURE
DIAGRAM YOU CAN FIND IS THE
LAYOUT OF HER APARTMENT.
BUT NEVERTHELESS, SHE IS
TOUTED AS BEING A WAY OF
PERSONALIZING INTERACTION,
PERSONALIZING INFORMATION
AND BRINGING INTERACTIVITY
TO A WIDE RANGE OF PEOPLE
ON THE WEB.
AND LET ME SHOW YOU
HOW SHE DOES THAT.
THESE ARE HER
VERY FIRST WORDS.
THESE WERE HER VERY FIRST
WORDS TO HER ADORING PUBLIC.

Ananova says I CAN'T TELL YOU
HOW MUCH I'VE LOOKED FORWARD
TO THIS MOMENT.
I'VE BEEN LOCKED IN A ROOM
FOR TWELVE MONTHS WITH
NOTHING BUT GEEKS AND
TECHIES FOR COMPANY.

Justine says OKAY, LET'S...

The audience laughs.

Justine says I LOVE THIS BECAUSE I MEAN
THIS WAS INVENTED BY GEEKS
AND TECHIES, BUT THERE'S
SOME SENSE THAT THAT'S
NOT AN ACCEPTABLE THING
TO THE GENERAL PUBLIC.
AND THE BEST THING
ABOUT HER IS HER SNEER.
DID YOU NOTICE HOW THE
CORNER OF HER LIPS GO UP
WHEN SHE SAID 'GEEKS'?
I LOVE THIS.
I MEAN THIS IS CLEARLY
THE APEX OF AI.
OKAY.
SO SHE'S A NEWSCASTER.
A NEWSCASTER IS NOT A
PARTICULARLY INTERACTIVE
PERSON IN THE FLESH.
BUT TO BE TOUTED AS
INTERACTIVE, AND
PERSONALIZABLE, I THINK IS
PART OF A COMMON CONFUSION
BETWEEN PERSONALIZE
AND PERSON.
IT'S NOT PERSONALIZED JUST
BECAUSE IT'S REPRESENTED
AS A PERSON.
SO SHE DOES A GOOD JOB AT
SOME EMOTIONAL EXPRESSION,
LIKE THAT LITTLE SNEER
WHEN SHE SAYS 'GEEKS'.
SHE DOES NOT, HOWEVER, ORIENT
HER EYES TOWARDS THE TASK.
WHEN SHE'S TALKING ABOUT
NEWS SHE DOESN'T LOOK AT
NEWS THE WAY NEWSCASTERS DO
ON TELEVISION, FOR EXAMPLE.
SHE DOESN'T SIGNAL THE
PROCESS OF INFORMATION.
SHE DOESN'T DO ANY KIND
OF INFORMATION PACKING.
THERE'S NO INFORMATION
PACKING CONSTRAINTS, FOR
EXAMPLE, BY LOOKING AT HER
INTERLOCUTOR... THAT WOULD BE
YOU... AND THEN
AWAY THE WAY WE DO.
WE CHUNK INFORMATION INTO
THE KINDS OF CHUNKS, IN FACT,
THAT ARE EASY TO TAKE IN TO
OUR LISTENERS BY USING OUR
EYES AND OUR HANDS TO
STRUCTURE PIECES OF INFORMATION.
SHE DOESN'T DO THAT
AND, MOST OF ALL,
ALL OF THIS IS
HAND-ANNOTATED.
IT'S NOT AUTOMATICALLY
GENERATED.
SO I STILL THINK THAT WE CAN
KEEP OUR JOBS, CONTINUE OUR
GRADUATE SCHOOL WORK, AND
GO ON TO LOOK AT WHAT IT IS
ABOUT THE FACE TO FACE
METAPHOR THAT WE CAN USE
IN GOOD INTERFACE DESIGN.
AND I WOULD SIMPLY SAY ABOUT
ANANOVA AND SOME OF THE
OTHER SYSTEMS... LET'S JUST
NAME THE MICROSOFT PAPER
CLIP AS ANOTHER EXAMPLE...
THAT IF YOU'RE GOING TO USE
A FACE-TO-FACE METAPHOR YOU
HAVE TO FOLLOW THE METAPHOR
TO ITS CONCLUSION.
WELL DESIGNED INTERFACES
HAVE AFFORDANCES.
AND BY THAT IS MEANT VISIBLE
CLUES TO THEIR OPERATION.
BODIES HAVE VERY STRONG
CLUES, VERY STRONG
AFFORDANCES ABOUT THE KINDS
OF PROTOCOLS THAT THEY CAN
ENGAGE IN AND AN INTERFACE
THAT USES A BODY SHOULD RELY
ON THOSE PROTOCOLS.
THAT IS, GREEN
HAIR IS GREAT.
A NICE SHIRT IS JUST
WONDERFUL, BUT THE POINT OF
ADDING A BODY TO THE
INTERFACE IS NOT BEAUTY
BUT FUNCTION.
I, OF COURSE, HAVE TO KEEP
SAYING THAT BECAUSE WHEN YOU
SEE MY OWN WORK IT'S NOT
BEAUTIFUL NECESSARILY,
BUT IT DOES RELY ON THE
FUNCTION OF THE HUMAN BODY
AND ON THEORIES OF THE
FUNCTIONALITY OF THE HUMAN BODY.
AND THAT'S A GOOD THING
FOR US TO WORK ON BECAUSE
HUMAN-HUMAN CONVERSATION
PROVIDES US AN ABUNDANCE OF
NATURAL DATA TO BUILD
HYPOTHESES FROM AND WE CAN
THEN BUILD INTERFACES AND
TEST THOSE HYPOTHESES IN
THEIR INTERACTIONS
WITH USERS.
SO WHAT ARE
BODIES USEFUL FOR?
WHAT ARE THE AFFORDANCES?
WHAT'S THE FUNCTIONALITY?
BODIES GIVE
AVAILABILITY CUES.
SO IF YOU WALK INTO THE
STUDENT UNION AND THERE ARE
TWO PEOPLE IN CONVERSATION...
ME AND SUZANNE, FOR EXAMPLE.
AND YOU WALK INTO THE
STUDENT UNION AND YOU WANT
TO ASK ME A QUESTION ABOUT
SOMETHING YOU REALLY DIDN'T
BUY ABOUT THIS TALK.
AND I AM TALKING TO SUZANNE
AND I GLANCE OVER AND I
CONTINUE TALKING TO SUZANNE
AND YOU STAND THERE AND I
CONTINUE TALKING TO SUZANNE.
YOU'LL QUICKLY GET THE
IDEA THAT THIS IS NOT A
CONVERSATION THAT
YOU'RE INVITED INTO.
WHAT WE DO WHEN WE DO
INVITE SOMEBODY INTO A
CONVERSATION IS HARDLY
DIFFERENT, BUT
SIGNIFICANTLY DIFFERENT.
THERE I AM,
TALKING TO SUZANNE.
YOU WALK UP, I LOOK
AT YOU, I LOOK BACK.
I LOOK AT YOU
AGAIN, I SMILE.
TOSS MY HEAD AND LOOK BACK
AT SUZANNE, AND YOU'LL ENTER
INTO THE CONVERSATION.
IT'S A VERY QUICK CUE BUT
IT'S A CUE THAT'S SUFFICIENT
IN CONVERSATION TO ALLOW
US TO KNOW AVAILABILITY.
WE ENGAGE AND WE
DISENGAGE WITH THE BODY.
BY THE WAY, YOU'RE WRECKED
FOR NORMAL CONVERSATION
FROM NOW ON.
YOU'RE NOT EVEN GOING TO PAY
ATTENTION TO THE WORDS ANY
MORE IF I DO MY JOB WELL.
SO WHEN YOU GO UP TO
SOMEBODY AND YOU HAVE A
CONVERSATION, YOU'LL NOTICE
THAT YOU DON'T TALK STRAIGHT ON.
THAT WHEN IT COMES TIME TO
START THE CONVERSATION,
YOU'LL TURN YOUR BODY A
LITTLE BIT, 45 DEGREES, AWAY
FROM YOUR INTERLOCUTOR,
FROM THE PERSON THAT
YOU'RE SPEAKING WITH.
APPARENTLY ETIOLOGISTS TELL
US THAT THIS COMES FROM THE
ANIMAL KINGDOM, THAT IT IS
THREATENING TO LOOK STRAIGHT
ON AT YOUR INTERLOCUTOR.
BUT WHATEVER THE REASON,
YOU'LL NOTICE THAT WE ENGAGE
IN A CONVERSATION,
SURPRISINGLY, BY ORIENTING
OUR BODY A LITTLE
BIT TO THE SIDE.
AND WE DISENGAGE BY TURNING
TO FACE THE OTHER PERSON,
AND THEN WALKING AWAY.
ALSO CHECKING OUR
WATCH AND SO FORTH.
AND THAT DISENGAGING BY
CHECKING OUR WATCH IS A
REALLY INTERESTING CUE.
THERE'S A 3D CHAT SYSTEM
CALLED THE PALACE.
HAVE ANY OF YOU USED THIS?
OKAY.
SO WHEN YOU USE THE
PALACE, THERE ARE AVATARS,
REPRESENTATIONS OF
OTHER USERS, THAT ARE
AUTOMATICALLY GENERATING
NON-VERBAL BEHAVIOURS,
EMBODIED BEHAVIOURS.
BUT THEY'RE GENERATED ON THE
BASIS OF A RANDOM MODEL OF
THE KINDS OF
THINGS BODIES DO.
AND WHAT'S AMAZING IS ONE OF
THE KINDS OF RANDOM THINGS
THAT BODIES DO IS
CHECK THEIR WATCHES.
AND SO WHEN I WAS PLAYING
WITH THIS, THERE I WAS
HAVING THIS CONVERSATION
AND I WAS SAYING THE MOST
AMAZING THING HAPPENED TO
ME AND THE OTHER AVATAR
CHECKED ITS WATCH.
THIS IS NOT AN
APPROPRIATE CUE.
IT'S A MEANINGFUL CUE, AND
AS HUMANS, WE CAN'T HELP BUT
ATTRIBUTE MEANING TO THESE
KINDS OF EMBODIED CUES.
WE GIVE FEEDBACK.
WE GROUND AND WE
REPAIR WITH THE BODY.
WE NOD.
I WON'T DO THIS TO ALL OF
YOU, BUT IT'S VERY EASY WHEN
YOU'RE STANDING AT THE
FRONT OF THE ROOM TO USE
INTONATION AND THE BODY TO
ELICIT A ROOM-WIDE NOD.
ALL I HAVE TO DO IS SAY,
THIS MORNING I WAS AT SHERWOOD.
EVERYONE'S GOING... IF YOU
KNOW WHERE THAT IS, YOU'LL
NOD, BECAUSE I'VE GIVEN YOU
AN INTONATIONAL CUE AND AN
EYE CUE THAT
REQUESTS FEEDBACK.
ONCE THAT INFORMATION EXISTS
IN OUR COMMON GROUND,
I'LL LOWER MY HEAD.
I'LL TURN MY EYES AWAY AND I'LL
CONTINUE MY CONVERSATION.
WE TAKE TURNS
THAT WAY AS WELL.
WE GIVE OVER THE TURN BY
LOOKING AT THE OTHER PERSON,
COMING TO A PAUSE, BRINGING
OUR HANDS DOWN OUT OF
GESTURE SPACE AND
STOPPING SPEAKING FOR
500 MILLISECONDS OR MORE.
THAT'S A CUE THAT THE OTHER
PERSON CAN BEGIN SPEAKING.
IF ANY OF YOU HAVE HAD ANY
EXPERIENCE WITH SOMEBODY WHO
YOU WOULD DESCRIBE AS NOT
LETTING YOU GET A WORD IN
EDGEWISE, YOU CAN NOW LOOK
AT WHAT THAT PERSON DOES,
AND YOU'LL NOTICE THAT HE OR
SHE WILL NOT LOOK AT YOU.
SO I WENT ON THIS TRIP.
IT WAS REALLY INTERESTING
AND I SAW ALL OF THESE
FABULOUS THINGS, AND I'LL
TELL YOU ABOUT THEM.
AND IF THEY LOOK AROUND AND
DON'T LET THEIR EYES REST ON
YOUR FACE, DON'T BRING THEIR
HANDS OUT OF GESTURE SPACE,
YOU'LL FIND IT VERY
HARD TO INTERRUPT.
EMPHASIS AGAIN.
WHEN WE EMPHASIZE WORDS,
WE RAISE OUR EYEBROWS.
I HAD A GRADUATE STUDENT WHO
ACTUALLY WORKED ON CONTRASTIVE
STRESS, AND HE HAD HIS
VERY OWN EMPHASIS CUE.
SCOTT PROVOST.
STOOD UP ON HIS TOES.
HE'D SAY, "IN CONTRAST."
SPATIAL DEIXIS, THE WAY I
INDICATED SVEN WHEN I TALKED
ABOUT MACHINE VISION.
WE VERY OFTEN DO THAT.
AND, INTERESTINGLY, IT IS
MUCH LESS LIKELY IN EVERYDAY
CONVERSATION TO POINT TO
OBJECTS THAT DO EXIST IN THE
PHYSICAL SPACE THAN TO POINT
TO PLACES AND THINGS THAT
STAND IN FOR CONCEPTS AND
TIMES, BECAUSE WE KIND OF
LIVE IN ABSTRACT WORLDS AND
HAVE ABSTRACT CONVERSATIONS.
WE NEVER, ALTHOUGH YOU SEE A
LOT OF THIS KIND OF STUFF IN
EMBODIED PRODUCTS... WE NEVER
SAY THE SPEAKER IS FOR YOU
TO LISTEN TO.
IN FACT, WE USE DEIXIS WHEN
REFERENCE IS AMBIGUOUS.

She places her computer speakers on separately on her desk.

She says SO I MIGHT SAY ONE OF THE
SPEAKERS IS BROKEN, AND ONLY
BY THE POINT DO YOU KNOW
WHICH OF THE SPEAKERS IS BROKEN.
WE WOULD NEVER, IF THERE'S
ONLY ONE SPEAKER, SAY THE
SPEAKER IS BROKEN, BECAUSE
YOU CAN SEE IT AS WELL AS I CAN.
AND WE USE ALL OF THESE
DIFFERENT MODALITIES
FOR TWO KINDS OF WORK.

A slide appears with the title "Some things bodies are useful for."

She says FOR PROPOSITIONAL WORK, THAT
IS FOR CONVEYING CONTENT,
OR COMMUNICATIVE INTENT,
TO ANOTHER PERSON OR TO
SEVERAL OTHER PEOPLE.
AND ALSO FOR CONVERSATIONAL
PROCESS, TO MODULATE OR
REGULATE HOW INFORMATION IS
CONVEYED, AT WHAT SPEED,
WHETHER IT'S UNDERSTOOD,
AND HOW IT'S TAKEN UP,
AND I'LL COME BACK
TO THAT IN A MOMENT.
BUT, FIRST, A QUIZ.
WATCH CAREFULLY.
YOU CAN TAKE OUT
A PIECE OF PAPER.
I'M GOING TO BE ASKING YOU
TO TURN THIS IN AT THE END
OF THE HOUR.
NOW I CAN EAT MY COOKIE.

A clip shows a woman talking and making hand-gestures.

Justine mumbles OKAY.
I'LL SHOW IT
TO YOU AGAIN.
[laughter]
I WAS GOING TO
DO THIS ANYWAY.
I'LL SHOW IT TO YOU AGAIN,
AND THEN I'M GOING TO SHOW
YOU A TRANSCRIPTION AND ASK
YOU TO IDENTIFY WHAT ASPECTS
OF WHAT SHE'S DOING WITH HER
FACE, HER HAND AND HER VOICE
HAVE TO DO WITH THIS PROCESS
AND PROPOSITIONAL CONTENT
OF INTERACTION.
SO NOW
REALLY
PAY ATTENTION.
I'LL SWALLOW.

She plays the video again.

She says OKAY.
WHAT DID YOU NOTICE?
SHE'S HAPPY.
SHE'S HAPPY.
SHE'S CERTAINLY SMILING.
THIS IS AN INTERESTING
THING ABOUT LOOKING AT THE
FUNCTION OF THE BODY.
AND I'M GOING TO COME BACK
TO THIS, BUT IT'S VERY
IMPORTANT TO SEPARATE OUT
SURFACE LEVEL BEHAVIOURS
FROM THEIR
PUTATIVE FUNCTION.
IT'S ONE OF THE MISTAKES OF
A LOT OF WORK THAT USES A
BODY AS THE INTERFACE IS TO
KIND OF HARD CODE SURFACE
BEHAVIOURS TO A FUNCTION.
SHE'S SMILING.
SHE'S HAPPY.
LET'S JUST SAY FOR THE
MOMENT SHE'S SMILING.
RISING INTONATION AT THE
END OF THE UTTERANCE.
SHE PAUSED AND SHE LET HER
EYES REST ON THE OTHER PERSON.
GREAT.
LET'S LOOK AT SOME
OF THOSE CUES.

A slide reads "Eyes and Head."

Justine says SO SHE SAID THE FIRST
FLOORS AND THEN SHE LOOKED
DIAGONALLY UP AND AWAY.
WE DO THAT DURING THE
PLANNING PHASE OF AN UTTERANCE.
APPARENTLY, COGNITIVE
PSYCHOLOGISTS TELL US THAT
THIS IS BECAUSE IT REDUCES
COGNITIVE LOAD DURING A
COMPUTATION INTENSIVE
PHASE OF PRODUCTION.
THAT IS, WE DON'T WANT TO
HAVE TO TAKE IN VISUAL
INFORMATION AT THE SAME
TIME AS WE'RE PLANNING
LINGUISTIC OUTPUT.
SHE SAYS IT'S LIKE A BOX AND
IT'S SURROUNDED BY A PORCH,
AND SHE TILTS HER HEAD AND
LOOKS AT HER LISTENER, AND
HER LISTENER GOES "MM-HMM."
SO SHE'S ELICITED
THAT FEEDBACK.
ALL AROUND... SO
ANOTHER PLANNING.
THE PORCH IS, I GUESS,
PRETTY MUCH SURROUNDING THE
WHOLE FLOOR AND THEN
SHE ROLLS HER EYES.
WHAT DOES SHE DO
WITH HER HANDS?
SO SHE'S USING HER HANDS TO
DESCRIBE THE GEOMETRY OF
WHAT SHE'S DESCRIBING.
THIS IS SO YOU KNOW THAT
YOU CAN TRY THIS AT HOME.
ALL YOU HAVE TO DO IS WRITE
2SH=CPTCFTO BEAT AND YOU CAN
EXACTLY THE SAME THING.
THIS IS A CODING SCHEMA THAT
I DESIGNED IN CONJUNCTION
WITH MY THEN ADVISOR, DAVID
MCNEIL, AND SOME OTHER
GRADUATE STUDENTS AT THE
UNIVERSITY OF CHICAGO.
2SH MEANS 2 SAME
HANDS EQUALS A C.
WE USE THE AMERICAN SIGN
LANGUAGE ALPHABET TO
DESCRIBE THE SHAPE
OF THE HANDS.
PTC, PALM TOWARDS CENTRE,
FTO, FINGERS TOWARDS OUT.
BEAT.
SO SHE GOES LIKE THIS.
THE FIRST FLOORS... TWO SAME
HANDS EQUAL A G... THAT'S THIS.
PALM TOWARDS DOWN,
FINGERS TOWARDS OUT.
IT'S LIKE A BOX.
I'VE GIVEN YOU
ABBREVIATED VERSION.
WE DO ALL KINDS OF TEMPORAL
EXTENT, PLACE ON THE BODY,
TRAJECTORY, WHEN WE'RE
DOING A FULL TRANSCRIPTION.
IT'S SURROUNDED BY A PORCH.
HER HANDS LOOSEN,
SAME GESTURE.
ALL AROUND.
THE PORCH IS, I GUESS,
PRETTY MUCH SURROUNDING THE
WHOLE FLOOR.
WHY DO YOU THINK SHE REPEATS
IT OVER AND OVER AGAIN?
DID YOU SEE ANYTHING IN
THE LISTENER'S BEHAVIOUR
THAT WAS STRIKING?
THE LISTENER TILTED HIS HEAD
AND HE DIDN'T GIVE HER A
NOD, AND SHE CONTINUES
TO DESCRIBE IT.
IT HASN'T BEEN ACCEPTED INTO
THE COMMON GROUND UNTIL
HE FINALLY NODS.
I'M NOT SURE I'D DESCRIBE
IT AS EMOTIONALLY INSECURE,
UNLESS ALL OF US WOULD
DESCRIBE OURSELVES AS
EMOTIONALLY INSECURE
BECAUSE WE ALL DO THIS.
WE CAN'T, IN FACT, IT'S
NOT A CONSCIOUS BEHAVIOUR.
IT'S AN UNCONSCIOUS IN THE
SAME WAY WE MIGHT THINK THAT
LANGUAGE IS UNCONSCIOUS.
THAT IS, WE HAVE AN INTENT
AND SURFACE LEVEL BEHAVIOURS
REALIZE THAT INTENT AND THAT
INTENT IS REALIZED EQUALLY
BY A SET OF GESTURES AND
FACIAL DISPLAYS AS BY SPEECH.
NOW WHEN I SAY EQUALLY, I
WANT TO DISPEL A COMMON MYTH
ABOUT GESTURE, AND THAT IS
THAT IT'S REDUNDANT TO SPEECH.

Another slide appears with the title "Hands."

Justine says IN FACT, IF YOU LOOK AT
DATA FROM HUMAN-HUMAN
INTERACTION, 50 PERCENT
OF GESTURES PRODUCED IN
NORMAL EVERYDAY CONVERSATION
ARE NON-REDUNDANT,
COMPLEMENTARY TO SPEECH.
SO, FOR EXAMPLE, I MIGHT SAY
IT TOOK US 30 MINUTES TO GET
HERE AND WE WERE REALLY
HURRYING,

She makes a "driving a car" hand gesture.

She says AS OPPOSED TO IT
TOOK US 30 MINUTES TO GET HERE
AND WE WERE REALLY HURRYING.

She makes a "walking" hand gesture.

She says OR A VERY COMMON GESTURE
THAT I SEE RECENTLY THAT
I DIDN'T USED TO SEE
IS I'LL LET YOU KNOW.

She makes a "typing" hand gesture.

She says I LOVE THAT.
I LOVE THAT.
BECAUSE YOU USED TO
SEE, I'LL LET YOU KNOW.

She makes a "phone calling" gesture.

She says THIS WAS A QUITE COMMON
GESTURE IN THE OLD DAYS,
IN MY TIME.
AND NOW THIS IS THE SAME
KIND OF COMMON GESTURE.
OKAY.
SO WHAT I'VE DONE THUS FAR
IS I'VE TOLD YOU ABOUT A
THEORY OF COMPUTATIONAL
SYSTEMS IN THE ENVIRONMENT,
AND I'VE POINTED OUT SOME
PROBLEMS BASED ON HOW HUMANS
INTERACT WITH OTHER HUMANS.
I'VE GIVEN YOU SOME DATA ABOUT
HUMAN-HUMAN INTERACTION.
NOW WHAT I NEED TO DO
IN ORDER TO MAKE THIS
COMPUTATIONALLY VALID,
INTERESTING FOR THAT MATTER,
IS TO INTEGRATE THESE
INSIGHTS ABOUT HUMAN-HUMAN
BEHAVIOUR INTO A MODEL, A
FORMAL MODEL, AND THIS IS
THE MODEL.

She shows a slide with the title "How to integrate these insights? FEMBOT model."

She says IT'S THE FEMBOT MODEL OF
HUMAN-COMPUTER INTERACTION
WHERE... I HAVE TO SAY,
I WISH I HAD THOUGHT OF
THIS, BUT FOR AGES, STUPIDLY
ENOUGH, I WAS GOING AROUND
TALKING ABOUT THE FBMT
MODEL, AND ONE OF MY
COLLEAGUES, MATHEW STONE AT
RUTGERS, POINTED OUT TO ME
THAT IF YOU JUST DID ONE
LITTLE TEENY TRANSPOSITION,
YOU ENDED UP WITH THE FEMBOT
MODEL, AND I'M
THRILLED ABOUT THIS.
MY STUDENTS HATE IT.
THEY REFUSE TO TALK ABOUT
THEIR WORK AS BEING A PART
OF THE FEMBOT MODEL,
BUT THERE IT IS.
IT'S THE FEMBOT MODEL.
WHERE F STANDS FOR THE
FACT THAT WE'RE WORKING IN
FUNCTIONALITY HERE.
THESE ARE MULTI-MODAL
BEHAVIOURS.
THEY ARE HANDS AND FACE,
POSTURE, CONTENT OF VOICE,
SPEECH AND ALSO INTONATION.
AND I'M INSISTING HERE... AND
I'M GOING TO CONTINUE TO
INSIST... ON A SEPARATION
BETWEEN FUNCTIONALITY AND
BEHAVIOUR OR REALIZATION.
THAT IS, WE CAN GIVE
FEEDBACK BY NODDING,
OR BY SAYING "UH-HUH."
ONE FUNCTION, TWO SURFACE
LEVEL BEHAVIOURS.
ONE SURFACE LEVEL
BEHAVIOUR, TWO FUNCTIONS.
A HEAD NOD CAN MEAN I'M
FOLLOWING OR I AGREE.
IT'S IMPORTANT TO
KEEP THESE DISTINCT.
IT MAKES THE SYSTEM MORE
FLEXIBLE, AND I SHOULD ALSO
SAY THAT IT SIMPLIFIES THE
ADDITION OF A PERSONALITY
AND A LAYER THAT'S
ATTRIBUTABLE TO CULTURE.
SO I'M SURE ALL OF YOU ARE
SITTING THERE SAYING YES.
SHE'S AN AMERICAN.
NONE OF THIS HOLDS
FOR CANADIANS.
BUT IT'S NOT TRUE.
I CAN TELL YOU THAT I'VE
LOOKED AT TAPES FROM, I
GUESS, 13 OR 14 DIFFERENT
LANGUAGES, AND SPEAKERS OF
AMERICAN ENGLISH, CANADIAN
ENGLISH, AND BRITISH
ENGLISH, SO NYAH.
AND EVEN THE BRITS, WHO
INSIST THAT THEY DON'T
GESTURE, TURN OUT TO GESTURE
JUST AS MUCH AS EVERYBODY ELSE.
THE ONLY DIFFERENCE IS THAT
THEY INSIST THAT THEY DON'T.
[laughter]
AND THAT IN ALL OF THE
CULTURES THAT I'VE LOOKED AT,
BOTH WESTERN
AND NON-WESTERN,
I'VE LOOKED AT
TAGALOG AND SWAHILI.
I'VE LOOKED AT JAPANESE AND
CHINESE, ENGLISH, FRENCH,
SPANISH, PORTUGUESE
AND A BUNCH OF OTHERS.
YOU FIND ALL OF THE
FUNCTIONS OF GESTURE BUT THE
SURFACE LEVEL REALIZATIONS
ARE DIFFERENT.
MANY NORTH AMERICANS BEGIN A
STORY BY SAYING I'M GOING TO
TELL YOU ABOUT THE COOLEST
THING THAT HAPPENED TO ME.

She turns her hands with the palms facing the ceiling.

She says THERE IT IS.
COOL.
RIGHT BETWEEN MY HANDS.
CHINESE SPEAKERS EQUALLY
ACROSS CHINESE SPEAKERS,
OFTEN SAY THIS IS A VERY
INTERESTING TALE I'M GOING
TO TELL YOU.

She moves her hands in circles over the desk.

She says NOW I DON'T KNOW WHY.
I DON'T ASSIGN MEANING TO
THOSE BEHAVIOURS BECAUSE
I DON'T KNOW.
BUT I DO KNOW THAT THE
BEHAVIOURS DIFFER, BUT THE
FUNCTIONALITY IS THE SAME.
AND THESE BEHAVIOURS ARE...
AND MUST BE REAL-TIME.
WHAT I MEAN BY REAL-TIME
IS A NUMBER OF THINGS.
FIRST OF ALL, THE
RESPONSIVENESS OF ONE PERSON
TO ANOTHER AND OF A SYSTEM
TO A PERSON MUST BE
EXTREMELY GOOD.
SO, FOR EXAMPLE, SUZANNE
SAYS TO ME I HAVE AN
UNDERGRADUATE
WORKING WITH ME.
I'D REALLY LIKE YOU TO CONSIDER
HER FOR GRADUATE SCHOOL.
I SAY IS SHE SMART?
SUZANNE SAYS, REALLY SMART.
I DON'T TAKE THE
GRADUATE STUDENT.
SHE SAYS TO ME,
REALLY SMART.
REALLY SMART.
AND I TAKE THE
GRADUATE STUDENT.
IN THIS SENSE, THE
SYNCHRONIZATION BETWEEN
BEHAVIOURS WITHIN ONE
PERSON IS LIKEWISE KEY.

A slide shows a chart with the caption "FEMBOT architecture. Functions, Modalities, Behaviours, Timing."

Justine says THAT LIKEWISE DEPENDS ON
THESE FOUR PROPERTIES,
FUNCTIONS, MODALITIES,
BEHAVIOURS, AND TIMING.
AND THIS IS THE GENERAL
ARCHITECTURE THAT WE USE
FOR ALL OF THE EMBODIED
CONVERSATIONAL AGENTS THAT
WE BUILD AND THAT I'M GOING
TO BE SHOWING YOU EXAMPLES OF.
AND WHAT'S IMPORTANT IS THAT
IT INTEGRATES THE INSIGHTS
FROM HUMAN-HUMAN
CONVERSATION THAT HAVE BEEN
FORMALIZED IN
THE FEMBOT MODEL.
IN TERMS OF RESPONSIVENESS,
THE INPUT MANAGER KNOWS THAT
CERTAIN BEHAVIOURS DON'T
NEED TO BE UNDERSTOOD
OR DELIBERATED UPON.
THEY SIMPLY NEED
TO BE REACTED TO.
SO WHEN THE USER SHOWS
UP INTO THE SPACE OF
INTERACTION, THE SYSTEM
NODS AS A WAY OF KNOWING
THAT THE USER IS THERE.
YOU DON'T NEED TO UNDERSTAND
WHAT THE USER'S PRESENCE MEANS.
YOU SIMPLY NEED A HARD-WIRED
REACTION AND THAT BYPASSES
THE DELIBERATIVE MODULE.
INPUT DEVICES ARE INTEGRATED
INSTANTLY IN THE INPUT...
NOT INSTANTLY.
IF ONLY IT WERE INSTANTLY.
THEY'RE INTEGRATED EARLY ON IN
THE PROCESS AND WHAT WE DO IS
WE PERIPHERALIZE BEHAVIOURS.
THAT IS, BEHAVIOURS ARE
TURNED INTO FUNCTIONALITY
EARLY ON IN THE
ARCHITECTURE.
THE CENTRE OF THE
ARCHITECTURE OPERATES ON
FUNCTIONS AND BEHAVIOURS...
FUNCTIONS ARE TRANSLATED
INTO BEHAVIOURS TOWARDS THE
END OF THE ARCHITECTURE WHEN
WE SCHEDULE ACTIONS AND SEND
THEM TO OUTPUT DEVICES.
THE ACTION SCHEDULER IS KEY
IN AN ARCHITECTURE LIKE THIS.
IT'S WHAT ENSURES
SYNCHRONIZATION
AMONG MODALITIES.
NOW IN ORDER TO DO THIS, WE
CAME UP WITH A NOTION OF FRAME.
AND FRAME IS USED TO
MEAN, YOU KNOW, FOR FIVE
PEOPLE THERE ARE SIX
DEFINITIONS OF FRAME.
SO THE NOTION OF FRAME
THAT WE HAVE IS A WAY OF
INTEGRATING BEHAVIOURS INTO
A FUNCTION EARLY ON AND THEN
CARRYING THAT FUNCTION
THROUGHOUT THE SYSTEM,
BUT ALWAYS PAIRING A
PROPOSITIONAL FUNCTION WITH
AN INTERACTIONAL FUNCTION.
LET ME SHOW YOU AN EXAMPLE.

A slide appears with the title "Example of Frame."

Justine says OKAY.
SO WHEN WE GET A BEHAVIOUR
INTO THE SYSTEM, WE EARLY ON
KNOW THAT IT'S AN INTENT
SUCH AS TO REQUEST A
DESCRIPTION OF SOMETHING
AND THAT IT'S REQUESTING A
DESCRIPTION WHILE TAKING THE
TURN AND WANTING FEEDBACK
FOR THE TURN
THAT IT'S TAKEN.
SO AN ACTUAL EXAMPLE IS THE
AGENT REA IS TALKING TO THE
USER AND SHE'S TELLING THE
USER ABOUT HER IMPRESSIONS
OF A HOUSE.
SHE DOES THAT BY SAYING
THOSE THINGS AND THEN GIVING
THE TURN TO THE USER TO
REQUEST FEEDBACK ON THE
CONTENT THAT SHE'S PUT
INTO THE INTERACTION.
THIS SHOULD BE PRETTY
ABSTRACT FOR THE MOMENT,
BUT IT'S GOING TO GET
A LOT CLEARER NOW.
THESE ARE SOME OF THE
EMBODIED CONVERSATIONAL
AGENTS THAT WE'VE BUILT
USING THIS THEORY
AND THIS ARCHITECTURE.

A slide shows several pictures and the title "Embodied Conversational Agents."

Justine says I PROBABLY WILL ONLY HAVE
TIME TO SHOW YOU ONE, BUT
I'VE GOT VIDEOS OF LOTS OF
THEM AND YOU'RE WELCOME TO
ASK ME TO SHOW THEM.
WE HAVE A SYSTEM CALLED
GRAND CHAIR THAT ENCOURAGES
OLD PEOPLE TO TELL THE
STORIES OF THEIR LIVES, AND
THIS IS PART OF A NEW TREND
IN PSYCHOLOGY OF THE ELDERLY
THAT SHOWS THAT REMINISCENCE
THERAPY IS EXTREMELY
VALUABLE AND IMPORTANT
FOR OLD PEOPLE.
IT'S ACTUALLY A RADICAL
DEPARTURE OF WHAT USED TO BE
SAID, WHICH WAS THAT TALKING
ABOUT YOUR LIFE WAS BAD FOR
OLD PEOPLE BECAUSE
IT MADE THEM SAD.
AND IT'S NOW BEING
DISCOVERED THAT, ON THE
CONTRARY, TALKING ABOUT YOUR
LIFE AND INTEGRATING YOUR
EXPERIENCE IS IMPORTANT AND
IT'S LIKEWISE IMPORTANT FOR
YOUNG PEOPLE TO KNOW ABOUT
THEIR GRANDPARENTS' LIVES,
BUT GRANDPARENTS OFTEN
DON'T LIVE WITH THEIR
GRANDCHILDREN ANYMORE.
SO THIS IS A VIRTUAL
GRANDCHILD WHO ELICITS...
[laughter]
STORIES OF THE OLD PERSON'S
LIFE, AND THEN THE WHOLE
INTERACTION IS TAPED
FOR THE GRANDCHILDREN.
IT'S A VERY FUNNY
INTERACTION.
AND WE ACTUALLY... IT'S PART
OF A NEW PARADIGM THAT WE'VE
BEEN WORKING ON CALLED
SHARED REALITY WHERE
EMBODIED CONVERSATIONAL
AGENTS ARE NOT AT ALL
VIRTUAL REALITY BUT PART
OF THE PHYSICAL SPACE.
THIS ARISES NATURALLY
FROM OUR EMPHASIS ON
REPRESENTATION AND
LOCALIZATION IN THE USE OF
SPACE, PART OF OUR
THEORETICAL COMMITMENT TO
THOSE PROPERTIES.
WE REALLY NEED TO HAVE A
SPACE IN WHICH TO INTERACT,
AND SO WE OFTEN THESE DAYS
INTEGRATE PHYSICAL OBJECTS
INTO THE CONTEXT.
THIS IS A ROCKING CHAIR
THAT'S WIRED UP SO THAT WHEN
YOU LEAN FORWARD, JENNY,
THE GRANDCHILD, KNOWS THAT
YOU'RE STARTING A NEW TOPIC
BECAUSE OUR HUMAN-HUMAN DATA
COLLECTION SHOWED THAT WHEN
PEOPLE ARE ABOUT TO START A
NEW TOPIC, THEY LEAN FORWARD,
SO THE CHAIR KNOWS THAT.
THAT GIVES ME THE
OPPORTUNITY TO SAY THAT THE
VOYAGE THAT I JUST TOOK YOU
THROUGH IS WHAT WE DO FOR
EACH PROJECT THAT WE BUILD.
WE START OUT BY BRINGING
PEOPLE INTO OUR LAB TO HAVE
CONVERSATIONS WITH NO
TECHNOLOGY AROUND.
WE WORKED WITH THE MELROSE
CENTRE FOR THE ELDERLY TO
LOOK AT LIFE STORIES AND
TAPED A LOT OF LIFE STORIES
BETWEEN GRANDPARENTS
AND GRANDCHILDREN.
WE LOOK AT OUR DATA AND COME
UP WITH HYPOTHESES ABOUT THE
INTERACTION BETWEEN VERBAL
AND NON-VERBAL BEHAVIOURS,
FORMALIZE THOSE INTO A MODEL
THAT ALLOWS US TO IMPLEMENT
A SYSTEM THAT WILL TAKE THE
ROLE OF ONE OF THE HUMANS
IN THE INTERACTION.
REA.
I'M GOING TO TELL
YOU MORE ABOUT REA.
MAC.
THIS IS A NEW SYSTEM.
HE'S A CONVERSATIONAL KIOSK.
MAC STANDS FOR MEDIA LAB
AUTONOMOUS CONVERSATIONAL
KIOSK, BUT THE L IS SILENT.
AND MAC ALLOWS PEOPLE TO
WALK UP... VISITORS TO THE
MEDIA LAB GET A MAP, A
KIND OF A 4-PAGE MAP, AND
IT'S INCREDIBLY OBSCURE.
AND THEY CAN WALK UP WITH THEIR
MAP AND PUT IT ON THE TABLE.
THERE'S A WACOM TABLET
UNDERNEATH, AND MAC KNOWS
THEREFORE WHAT THEY'RE
POINTING TO ON THE MAP.
THERE'S INFRARED THAT TELLS
WHAT PAGE PEOPLE ARE ON.
AND HE CAN DESCRIBE PROJECTS
AND PEOPLE AT THE MEDIA LAB,
BOTH WITH GESTURE AND BY
INDICATING THINGS ON THE
USER'S MAP.
SAM IS A SYSTEM
FOR CHILDREN.
ONCE AGAIN, ANOTHER SHARED
REALITY SYSTEM WHERE A
VIRTUAL CHILD CAN PLAY
WITH A REAL CHILD AROUND A
PHYSICAL CASTLE AND THE
VIRTUAL CHILD AND REAL CHILD
CAN PASS TOYS BACK AND
FORTH FROM THE VIRTUAL
TO THE REAL WORLD.
YOU'RE GOING TO HAVE TO
ASK ME ABOUT THAT DURING
THE QUESTION PERIOD.
AND SITU CHAT, WHICH IS A
WAY OF APPLYING THE SAME
INSIGHTS TO CHAT AND THERE
WHAT WE'RE INTERESTED IN IS
HOW MUCH INFORMATION ABOUT
FUNCTIONALITY WE CAN EXTRACT
FROM TYPED TEXT.
TEXT TYPED BY USERS.
AND HOW WE CAN USE THE
FUNCTIONALITY THAT WE
EXTRACT TO THEN GENERATE
NON-VERBAL BEHAVIOUR THAT'S
REPRESENTED ON
A USER'S AVATAR.
OKAY.
SO NOW I'M GOING TO GIVE
YOU, AS AN EXAMPLE, ONLY ONE
OF THESE SYSTEMS, WHICH IS
THE MOST COMPLEX SYSTEM
CALLED REA AND WE DON'T
BUILD EVERYTHING, AND HERE
ARE SOME OF THE INPUT-OUTPUT
COMPONENTS THAT WE USE.

A slide appears with the title "Example: Rea IO Components."

Justine says THE VISION SYSTEM IS STIVE.
THAT'S A SYSTEM BUILT BY
SANDY PENTLAND AND SOME OF
HIS STUDENTS AT
THE MEDIA LAB.
AND A NEW SYSTEM CALLED
GESTERP BUILT BY ONE OF MY
GRADUATE STUDENTS, LEE
CAMPBELL, WHO'S ABOUT TO
GET HIS PhD.
WE HOPE.
AND YOU'LL SEE MORE ABOUT
THAT SYSTEM AS I TALK.
WE'RE USING AUDIO THRESHOLD
SO THAT WE CAN INSTANTLY
KNOW WHEN THE USER IS
SPEAKING AND WE CAN REACT TO
THAT WITH A NOD BEFORE
THE SPEECH IS PROCESSED
AND UNDERSTOOD.
AND FOR SPEECH RECOGNITION,
WE'RE USING IBM VIAVOICE,
BUT WE'RE CURRENTLY
EXPERIMENTING WITH MOVING TO
SUMMIT WHICH IS THE MIT
AI LAB SYSTEM BUILT BY
VICTOR ZU, JIM GLASS,
AND THEIR COLLEAGUES.
AND FOR OUTPUT, THE
ANIMATION IS WRITTEN IN OPEN
GL AND FOR OUR TEXT TO
SPEECH, WE'VE JUST MOVED TO
BRITISH TELECOM'S FESTIVAL.
I'M NOT PARTICULARLY HAPPY
WITH THE QUALITY OF THE
VOICE AND YOU'LL
PROBABLY AGREE WITH ME.
BUT BECAUSE OF THE ISSUES OF
REAL-TIME SYNCHRONIZATION,
WE NEED A TEXT TO SPEECH
ENGINE THAT GIVES US BACK
PHONEME TIMINGS.
WE NEED EVENT TIMINGS SO
THAT WE CAN TIME GESTURE TO
SPEECH AT THE PHONEME LEVEL
AND FESTIVAL'S ONE OF THE
FEW SYSTEMS THAT
WILL DO THAT.
OKAY.

Another slide appears with the title "Example of Interaction."

Justine says SO NOW YOU HAVE
ALL OF THIS THEORY.
YOU HAVE AN ARCHITECTURE.
LET'S LOOK AT HOW THIS
IS USED IN AN ACTUAL
INTERACTION BETWEEN
A USER AND REA.
AND SO THIS MAKES SENSE TO
YOU, FIRST I'M GOING TO WALK
YOU THROUGH WHAT HAPPENS
INSIDE THE SYSTEM,
AND THEN I'LL SHOW
YOU A VIDEO.
SO THE USER HAS COME UP TO
REA AND ASKED FOR A PLACE
IN BOSTON.
SO LET ME GIVE YOU A LITTLE
BIT OF BACKGROUND HERE.
REA IS A REALTOR, SO REA
STANDS FOR REAL ESTATE AGENT.
AND PART OF THAT IS BECAUSE
WE COULDN'T HELP OURSELVES
FROM MAKING THE PUN,
VIRTUAL REALTY AGENT.
AND SO REA HAS ACCESS TO
A DATABASE OF HOUSES
IN THE BOSTON AREA.
SO A USER JUST WALKS UP
AND SAYS, I'M LOOKING FOR
A PLACE IN BOSTON.
THE USER'S INPUT IS I'M
LOOKING FOR A PLACE IN
BOSTON, AND THEN THE
USER STOPS SPEAKING.
THE INPUT MANAGER TAKES THAT
INFORMATION AND INTEGRATES
IT INTO A FRAME, SENDS IT TO
THE UNDERSTANDING MODULE.
THE UNDERSTANDING
MODULE KNOWS THAT THE
CONVERSATIONAL STATE IS THAT
IT'S THE USER TURN AND THAT
THIS IS GENERATING SOME
KIND OF OBLIGATION THAT THE
SYSTEM MUST RESPOND TO.

A slide shows a chart with the caption "Understanding. Imput: ‘I’m looking for a place in Boston. Prop: Request Place. Inter: Giving Turn."

Justine says THE DELIBERATIVE MODULE GOES
FURTHER AND KNOWS THAT THE
PROPOSITIONAL CONTENT IS
THAT THE USER IS REQUESTING
A PLACE IN BOSTON AND THAT
THE INTERACTIONAL CONTENT TO
COME AT THE END OF THIS IS
TO GIVE OVER THE TURN TO
REA, SO REA'S TURN IS BOTH...
REA'S GOING TO HAVE AN
OBLIGATION AND THEN SHE'S
GOING TO HAVE TO TAKE THE TURN.
AND THE OBLIGATION IS GOING
TO BE TO OFFER A HOME TO THE
USER IN THE BOSTON AREA.
HOME NUMBER ONE.
AND TO DESCRIBE THAT HOME.
SO WHAT SHE DOES, THE
GENERATION MODULE THEN
GENERATES A PHRASE THAT WILL
OFFER A HOME IN THE BOSTON
AREA AND THAT WILL TAKE THE
TURN USING THE FULL SET OF
EMBODIED BEHAVIOURS
TO DO THAT.
AND THOSE BEHAVIOURS ARE
SENT TO THE ACTION SCHEDULER
WHICH GENERATES A GLANCE UP
AND AWAY FOR THE PLANNING
PHASE, AND A LOOK BACK AND
SHE SAYS I HAVE A CONDO.
AND THE REASON SHE GENERATES
A BEAT IS BECAUSE THIS IS
RHEMATIC INFORMATION
OR NEW INFORMATION.

Another slide shows a chart with the caption "Generation. Prof: Offer-Home 1. Inter: Take-Turn. Output: Glance up-away. ‘I have a (beat) condo’."

Justine says IT'S THE FIRST TIME THIS
DISCOURSE ENTITY HAS BEEN
MENTIONED AND PEOPLE
GENERATE A GESTURE ALONG
WITH THEIR FIRST MENTION
OF A NEW DISCOURSE ENTITY.
IF THE ENTITY IS EASILY
DESCRIBABLE, THERE'S SOME
ASPECT THAT CAN BE DESCRIBED
OVER AND ABOVE THE CONTENT
CONVEYED IN SPEECH, THEY'LL
GENERATE AN ICONIC GESTURE
TO REPRESENT IT.
BUT IF NOT, THEY'LL STILL
GENERATE A LITTLE BEAT
OF THE HAND.
SO SHE DESCRIBES THE HOME.
HOWEVER, THAT'S A TWO
UTTERANCE JOB, AND SO SHE
KEEPS THE TURN BY
LOOKING BACK AWAY.
GENERATES ANOTHER UTTERANCE...
AND I'M NOT GOING TO GO
INTO THIS IN DETAIL,
ALTHOUGH WHEN THE
COMPUTATIONAL LINGUISTICS
GROUP MEETS LATER, I'M MORE
THAN HAPPY TO TALK ABOUT
THIS BECAUSE WE'RE REALLY
EXCITED ABOUT THIS WORK.
WE'VE TAKEN SPUD, WHICH
IS MATHEW STONE'S NATURAL
LANGUAGE GENERATION ENGINE,
AND WE'VE... I WAS GOING TO
SAY TWEAK, BUT TWEAK IS NOT
SOMETHING YOU DO OVER A YEAR
AND A HALF.
SO WE'VE WORKED ON IT SO
THAT IT GENERATES IN REAL TIME
BOTH AN UTTERANCE AND AN
APPROPRIATE GESTURE FROM
ONE SINGLE UNDERLYING
CONCEPTUAL REPRESENTATION.
AND WHAT REA SAYS IS IT'S IN
A BUILDING... IT'S IN A CONDO
IN BOSTON.
I THINK ACTUALLY
THAT'S A TYPO.
SHE SAYS IT'S IN A
BUILDING IN BOSTON.

Justine raises her hand.

She says AND ONLY IN HER GESTURE DO
YOU SEE THAT IT'S HIGH UP IN
A BUILDING IN BOSTON, AND
THEN SHE GLANCES AWAY,
AND SHE GIVES THE TURN
OVER TO THE USER.
AND LOOKS AT THE USER,
AND THAT'S A CUE FOR
USER TAKE TURN.
OKAY.
WHY DON'T WE WATCH THIS?

A clip rolls with the caption "Rea: Experiment in virtual Reality. House Tour."

A man says TELL ME
ABOUT THE HOUSE.

Rea says IT HAS TWO STOREYS
AND THERE ARE STAIRS
IN THE MIDDLE.

The man says HOW FAR
IS THE HOUSE?

Rea says IT IS FIVE MINUTES TO
THE PORTER SQUARE T STOP.

She makes a "walking" hand gesture.

Justine says CAN YOU SEE THAT
BEAUTIFUL GESTURE?

Rea says IT HAS TWO BEDROOMS,
ONE KITCHEN, THREE BATHROOMS,
AND LIVING ROOM.

The man says TELL ME ABOUT
THE KITCHEN.

The man says SHOW ME THE KITCHEN.

Rea says HERE'S A CHIMNEY.

Justine says SHE STILL CAN'T TWIST HER
TORSO, SO SHE HAS TO POINT
OVER HER SHOULDER.
CAN YOU SEE THE CLIPBOARD
THAT SHE'S LOOKING AT?

Rea says IT HAS CLASSIC
VICTORIAN STYLE,
BUT ALSO MODERN AMENITIES.

Justine says I
WONDER IF WE CAN...

The man says TELL ME ABOUT
THE VICTORIAN STYLE.
YAY.

Rea says THE LIVING ROOM HAS
ORIGINAL HARDWOOD FLOORS.

The man says TELL ME ABOUT
THE MODERN AMENITIES.

Rea says THE LIVING ROOM
HAS AN INTERCOM
AND DESIGNER-BUILT
PLAY ROOM.

The man says WHERE IS THE BEDROOM?

Rea says IT IS UPSTAIRS.

The man says WHERE IS THE BATHROOM?

Rea says IT IS NEXT TO THE BEDROOM.

The man says TELL ME ABOUT
THE BATHROOM.

Rea says THERE IS A JACUZZI
IN THE BATHROOM.

Justine says BUBBLES.

The man says SHOW ME
THE BATHROOM.

Rea says HERE'S THE JACUZZI.

The man says I CHANGED MY MIND.

Rea says OKAY.

The man says I'M LOOKING FOR A CONDO.

Justine says OKAY,
WE'LL STOP THERE BEFORE
SHE GIVES HIM A CONDO TOUR.
I WANT TO TELL YOU A COUPLE
OF THINGS ABOUT WHAT YOU'VE
JUST SEEN.
AS I TOLD YOU,
SYNCHRONICITY IS KEY,
REAL-TIME RESPONSIVENESS
IS KEY.
THESE THINGS HAVE TO
HAPPEN REALLY QUICKLY.
AT THE SAME TIME, WE'RE
USING A NATURAL LANGUAGE
GENERATION ENGINE WHICH
ALTHOUGH WE WOULD LIKE IT TO
BE REALLY SPEEDY, IS NOT
NECESSARILY SO SPEEDY,
AND SO WHAT WE DID WAS
WE GAVE HER A CLIPBOARD.
WHEN THE NATURAL LANGUAGE
GENERATION ENGINE IS
WORKING, SHE LOOKS AT HER
CLIPBOARD AND THAT SHOULD
ORIENT YOUR EYES TO THE
CLIPBOARD WHILE SHE'S
GENERATING WHAT TO SAY.
YOU'VE NOTICED A BUNCH OF
DIFFERENT KINDS OF GESTURES.
SHE GENERATING BEAT GESTURES
WHEN THERE'S NO OTHER KIND
OF GESTURE THAT MAKES
SENSE TO GENERATE.
BUT SHE'S GENERATING
GESTURES THAT HAVE TO DO
WITH THE IMPRESSION OR THE
ASPECT OR THE SURPRISING
FEATURE OF A NEW
DISCOURSE ENTITY.
SO WHEN HE ASKS HER TO TELL
HIM ABOUT THE KITCHEN,
SHE SAYS THERE'S
A CHIMNEY IN IT.
NOW THAT'S REALLY FUNNY
BECAUSE IT'S NOT THE FIRST
THING THAT COMES TO OUR MIND
TO DESCRIBE WHEN YOU SAY
TELL ME ABOUT THE KITCHEN.
BUT TO HER, IT'S THE MOST
SURPRISING FEATURE OF WHAT
SHE FINDS IN THE DATABASE
ABOUT THE KITCHEN.
YOU'VE ALSO SEEN HER GENERATE
SOME COMPLEMENTARY GESTURES
WHEN SHE SAYS IT'S FIVE
MINUTES FROM THE PORTER
SQUARE T STOP, AND YOU'VE
SEEN HER USE HER HEAD TO
REGULATE THE
CONVERSATION WITH TIM.
NOW WHAT'S INTERESTING TO
NOTE HERE IS THAT IT'S SLOW.
IT'S NOT AS QUICK AS WE
WOULD LIKE, AND WE'RE WORKING
ON OPTIMIZING THE SYSTEM
SO AS TO MAKE IT QUICKER.
BUT IN THE MEANTIME, WHAT
WE FIND IS THAT USERS
ENTRAIN TO HER SPEED.
THAT IS, IN OUR EVALUATIONS
OF THE SYSTEM... AND I'LL GET
ONTO THE QUESTION OF
EVALUATION MORE COMPLETELY
IN A COUPLE OF MINUTES... BUT
IN OUR EVALUATIONS OF THE
SYSTEM, WHAT WE FIND IS
THAT PEOPLE BEGIN TO SPEAK
AT HER SPEED.
THEY GESTURE TO HER WITHIN
AROUND TWO TURNS, AND IT'S
REALLY WEIRD TO WATCH THIS,
BECAUSE PEOPLE WILL WALK UP
AND THEY'RE IN FRONT OF A
SCREEN SO THEY BEGIN BY JUST
KIND OF LOOKING DOWN AND,
YOU KNOW, THEY LOOK AT THE
SCREEN AND THEY KIND OF
STARE AT THE SCREEN, AND
THEN REA COMES ON AND
SHE SAYS, OH, YEAH.
I CAN SHOW YOU A HOUSE
AND THEY SAY, OH REALLY?
WHERE IS IT?
AND ALL OF A SUDDEN, THEY'RE
ACTING AS IF THIS IS A HUMAN.
THEY'RE POINTING AT PLACES
THAT SHE CAN'T SEE BECAUSE
SHE'S A CARTOON, AND
NEVERTHELESS, THEY POINT
AT IT, AND THIS NOTION OF
ENTRAINMENT, WHICH MEANS TO
INCREASINGLY SYNCHRONIZE
ONE'S CONTRIBUTIONS, WE'D
LIKE IT TO BE THE CASE THAT
BOTH THE SYSTEM AND THE USER
ENTRAIN TO ONE ANOTHER.
BUT FOR THE MOMENT, THAT NOT
BEING POSSIBLE, WE DO FIND
THAT USERS GET INTO
SYNC WITH THE SYSTEM.
OKAY.
ONE OF THE THINGS THAT
YOU SAW HERE THAT'S QUITE
INTERESTING, AND I JUST WANT
TO SHOW YOU A VERY QUICK
CLIP OF IT... IS SOME
GESTURE RECOGNITION.
GESTURE RECOGNITION IS A
REALLY BIG TOPIC IN THE
MACHINE VISION COMMUNITY,
BUT WE'RE TAKING A VERY
DIFFERENT APPROACH.
WE'RE TAKING A FUNCTIONAL
APPROACH TO THE RECOGNITION
OF GESTURE.
THAT IS, WE'RE SAYING
SURFACE LEVEL BEHAVIOURS ARE
REALLY HARD TO RECOGNIZE.
WE CAN'T SEE WHAT THE USER
IS DOING WITH HIS FINGERS,
BUT WE BELIEVE THAT WE CAN
RECOGNIZE THE FUNCTION OF
THE GESTURE TO INTRODUCE A
NEW ENTITY, FOR EXAMPLE.
AND THIS IS AN
EXAMPLE OF THAT.
THIS IS SOMEBODY HAVING
A CONVERSATION WITH REA.
ONLY BY HIS GESTURE WILL REA
KNOW WHAT WORD HE'S STRESSING.
IF WE COULD DO INTONATION
INTERPRETATION WE COULD ALSO
KNOW THAT.
BUT THAT'S AN EVEN
HARDER PROBLEM.
SO ONLY BY THE GESTURE DOES
THE SYSTEM KNOW WHERE THE
STRESS IS BEING PLACED.

A clip rolls with the caption "Gesture Tracking."

The man says I LIKE BLUE TILES.

Rea says I LIKE TILES.

The man says I LIKE BLUE TILES.

Rea says BLUE IS MY
FAVOURITE COLOUR.

The man says WHAT IS THAT?

The man says I LIKE BLUE TILES.

Rea says ME TOO.
[laughter]

Justine says SO WHAT YOU'VE JUST SEEN
HERE IS BEING THE NICE
OBSEQUIOUS REALTOR THAT SHE
IS, WHEN HE SAYS I LIKE
BLUE TILES, SHE SAYS TILES ARE GREAT.
WHEN HE SAYS I LIKE BLUE
TILES, SHE SAYS, OH, BLUE'S
MY FAVOURITE COLOUR, TOO.
AND WHEN SHE CAN'T TELL
WHERE THE GESTURE IS,
SHE SIMPLY SAYS ME TOO.
AND YOU'VE ALSO SEEN THERE
THAT WE'RE TRACKING DEIXIS.
THAT'S A NEW ASPECT OF THE
SYSTEM IS THAT WE CAN TELL
WHAT THE USER IS POINTING
AT AND WE CAN USE THAT
FOR REFERENCE RESOLUTION.
OKAY.
SO THAT'S REA, AND YOU
SHOULD GET A SENSE OF HOW
THESE THEORIES OF
HUMAN-HUMAN INTERACTION HAVE
BEEN USED IN BUILDING THIS
SYSTEM, HOW FUNCTIONS AND
BEHAVIOURS ARE KEPT
SEPARATELY, THE IMPORTANCE
OF TIME WITHIN THE SYSTEM
AND BETWEEN THE SYSTEM AND
THE HUMAN, AND HOW
MULTI-MODALITY IS USED FOR
ITS FUNCTIONALITY AND
NOT JUST BECAUSE WE CAN.

A slide appears with the title "Video Recognition Context."

Justine says I'LL JUST SHOW YOU A CLOSER
PICTURE OF MAC, BECAUSE I
THINK HE'S REALLY GORGEOUS.

A slide shows a picture of a blue robot.

Another slide appears with the title "Embodied Learning Peer."

Justine says AND ANOTHER SYSTEM THAT
WE'VE BEEN WORKING ON.
SAM.
I TOLD YOU ABOUT SAM.
AND THE IMPORTANCE OF SAM IS
THAT THERE'S BEEN A LOT OF
WORK DONE ON TUTORING
SYSTEMS FOR A LONG TIME, BUT
TUTORING SYSTEMS ARE ALWAYS
EXPERTS TUTORING CHILDREN.
BUT FROM THE LITERATURE ON
CHILDREN'S DEVELOPMENT, WE
KNOW THAT IN MANY TASKS
CHILDREN FUNCTION AT A
HIGHER LEVEL IN
INTERACTION WITH A PEER.
SO MORAL DECISION MAKING IS
MORE ADVANCED WITH A PEER.
LINGUISTIC SOPHISTICATION IS
MORE ADVANCED WITH A PEER
AND SO FORTH.
AND SO SAM IS A LEARNING
COMPANION WHO DOESN'T TUTOR
A CHILD BUT WHO WORKS WITH
A CHILD AND THAT'S THE
SAM SYSTEM CLOSE UP.
SO ALL OF THESE THINGS,
ALL OF THESE SYSTEMS GET
CAREFULLY EVALUATED.
WE DO HUMAN-HUMAN
INTERACTION IN THE BEGINNING.
WE BUILD A SYSTEM AND
THEN WE PUT THE SYSTEM IN
INTERACTION WITH A HUMAN AND
LOOK AT THE OUTCOME, AND
WE'VE DONE A NUMBER
OF USER STUDIES.
WE DID AN EARLY USER STUDY
WHERE WE PITTED FACIAL
DISPLAYS TO REPRESENT
EMOTION AGAINST FACIAL
DISPLAYS TO REPRESENT THE
REGULATION OF CONVERSATION,
AND WE WERE ABLE TO SHOW
THAT THE REGULATION OF
CONVERSATION IS MORE
IMPORTANT TO USERS'
BEHAVIOUR THAN IS EMOTION,
AND IS MORE IMPORTANT TO
THEIR SENSE OF THE SYSTEM.
THEY THOUGHT THE SYSTEM THAT
WAS EMOTIONAL WAS FRIENDLY
BUT STUPID, AND THEY THOUGHT
THE SYSTEM THAT COLLABORATED
USING REGULATORY BEHAVIOURS
WAS SMARTER AND JUDGED
THEMSELVES TO BE
MORE COLLABORATIVE.
WE'VE ALSO JUST DONE A LARGE
STUDY ON THE USE OF SMALL TALK.
WE'VE BUILT A NEW KIND OF
DISCOURSE PLANNER ACTUALLY
BASED ON PATTY MOSS' DO THE
RIGHT THING WORK AND WE'RE...
THIS IS A BIG TOPIC.
I'LL TALK MORE ABOUT
THIS THIS AFTERNOON.
BUT WE'VE BEEN LOOKING AT
THE USE OF SMALL TALK IN
CONVERSATION, NOT BECAUSE
IT'S CUTE... ALTHOUGH IT IS
PRETTY CUTE... BUT BECAUSE
IN HUMAN-HUMAN CONVERSATION
PEOPLE USE SMALL TALK TO
MITIGATE FACE THREAT... THAT
IS TO MAKE A CONVERSATION
GO EASIER... TO REDUCE
INTERPERSONAL DISTANCE AND
TO INCREASE TRUST AND WE'VE
BEEN ABLE TO SHOW THAT REA,
WHEN REA USES SMALL TALK, A
LARGE NUMBER OF USERS JUDGE
THE SYSTEM TO BE MORE
TRUSTWORTHY AND THEMSELVES
TO BE MORE TRUSTWORTHY,
AND JUDGE THEMSELVES
TO BE MORE ENGAGED.
BUT WHEN I SAY A LARGE
NUMBER OF USERS, WHAT WE
DISCOVERED, TO OUR SURPRISE,
IS THAT EXTROVERTS NEED
SMALL TALK TO ENGAGE WITH
THE SYSTEM AND INTROVERTS
DON'T LIKE IT.
SO PEOPLE WHO DON'T LIKE TO
SPEAK A LOT AND WHO DON'T
LIKE TO HAVE CONVERSATIONS,
DON'T LIKE REA'S ATTEMPT TO
DRAW THEM OUT."

A slide appears with the title "Small Talk Evaluation."

Justine says AND YOU CAN SEE THAT IN
THESE FABULOUS CHARTS WHERE
IT SHOWS THAT INTROVERTED
USERS DON'T REALLY USE THE
SMALL TALK.
EXTROVERTED USERS HAD A
RADICAL DIFFERENCE IN THEIR
TRUST OF THE SYSTEM WITH
SMALL TALK AND WITHOUT, AND
WE'VE ALSO LOOKED AT
CONVERSATIONAL INITIATION
AND WE FIND THAT PEOPLE WHO
INITIATE CONVERSATION LIKE
SMALL TALK.
PEOPLE WHO WAIT FOR THE
SYSTEM TO INITIATE DON'T
LIKE SMALL TALK.
OKAY.
YOU CAN GUESS THE NUMBER OF
FUTURE CHALLENGES BY THE
FACT THAT THERE ARE SO MANY
BOLDED ITEMS ON THIS LIST.
THIS IS A LARGE
RESEARCH PROJECT.
THERE'S A LOT OF THEORY
BUILDING THAT WE DO WITHIN
EACH ASPECT OF THE SYSTEM.
FOR EXAMPLE, EVEN JUST
WITHIN THE INTEGRATION OF
SPEECH AND HAND GESTURE.
WE LOOK VERY CAREFULLY AT
THEORIES OF DISCOURSE STRUCTURE.
WE'RE CURRENTLY LOOKING
AT THE MORPHOLOGY OF HAND
GESTURES AND HOW TO PREDICT
MORPHOLOGY FROM THE
SEMANTICS OF SPEECH.

A slide appears with the caption "Future Challenge."

Justine says WE'RE LOOKING AT RECOGNITION
OF INTONATION, A BETTER
FUSION OF MULTI-MODAL
BEHAVIOURS IN THE INPUT.
BETTER SYNCHRONIZATION IN THE
OUTPUT, AND MORE EVALUATION.
OKAY.
SO LET ME CONCLUDE BY SAYING
THAT SEAMLESS COMPUTING IS
CLEARLY GREAT.
I MEAN OBVIOUSLY
WE WANT THIS.
PERVASIVE, UBIQUITOUS
INTELLIGENCE IN OUR SPACES
COULD ONLY BE A GOOD THING.
BUT DOES IT NEED
TO BE INVISIBLE?
COULD WE LOCALIZE
THAT INTERACTION?
IN CONTEXT, THAT IS, AS
WITH OTHER PEOPLE, I THINK
DEMANDS A MODEL OF HOW WE
INTERACT WITH OTHER PEOPLE
IN CONTEXT, WHICH MEANS HOW
WE USE OUR BODIES IN SPACE.
WE NEED A MODEL OF THAT
HUMAN BEHAVIOUR AND WE NEED
TO KNOW WHERE
THAT MODEL STOPS.
FOR EXAMPLE, YOU'LL NOTICE
THAT I DON'T TALK ABOUT
EMOTION, AND THE REASON I
DON'T TALK ABOUT EMOTION IS
THAT I DON'T THINK THAT WE
HAVE AN ADEQUATE MODEL OF
THE RELATIONSHIP BETWEEN THE
FUNCTIONALITY OF EMOTION AND
THE SURFACE LEVEL
REALIZATION OF EMOTION ON
THE BODY, AND SO
I DON'T DO IT.
I THINK THAT IT'S KEY
FOR REASONS LIKE THAT TO
DISTINGUISH THE CONTENT OF
INTERACTION, PROPOSITIONAL
FUNCTIONALITY, WITH DEVICES
FOR ITS REGULATION,
INTERACTIONAL FUNCTIONALITY.
THAT IT'S KEY TO LOOK AT HOW
THESE SYSTEMS CAN EXIST
IN OUR SPACES AND
SHARE OUR REALITY.
AND THAT WHEN WE LOOK
AT THINGS LIKE ANANOVA,
HUMAN IS AS HUMAN DOES.
GREAT.
IT'S DECORATIVE.
SHE HAS GREEN HAIR.
SHE HAS A LITTLE SNEER.
BUT THE BODY IN THE
INTERFACE IS NOT FOR DECORATION.
IT'S FOR COMMUNICATIVE
CONTENT AND PROCESS.
AND WE CAN BUILD SUCH A
SYSTEM BY TAKING INTO
ACCOUNT THE FOUR PROPERTIES
OF THE MODEL I'VE TALKED
ABOUT AND IT'S NICE TO
NOTE THAT SUCH A MODEL HAS
ALLOWED US TO EXTEND OUR
WORK INTO THE SOCIAL
FUNCTIONALITY OF THE
INTERFACE, INTO SYSTEMS THAT
USE TANGIBLE INTERFACES
AS WELL AS SCREEN-BASED
INTERFACES AND INTO
TUTORING SYSTEMS.

Classical music plays as the end credits roll.

Comments and queries, email: bigideas@tvo.org

Telephone: (416) 484-2746.

Big Ideas, TVONTARIO, Box 200, Station Q, Toronto, Ontario, Canada. M4T 2T1.

Producer, Wodek Szemberg.

Associate Producer, Mike Miner.

Sound, Maurice Dalzot.

Executive Producer, Doug Grant.

A production of TVOntario. Copyright 2001, The Ontario Educational Communications Authority.

Watch: Justine Cassell on Non-Verbal Communication