An feature of video calls that all of us take for granted is how they can move between streams to show the speaker. Great — If you're chatting, it's how you interact. Silent speech like sign language doesn't cause certain algorithms, alas, but this Google research could change that.
It's a real-time sign language recognition engine that can tell when someone signs (as opposed to only jumping around) and when they're done. Of course, it's easy for humans to claim this kind of thing, so it's tougher for a video call system that's used to drive pixels.
A recent paper by Google researchers, introduced (virtually, of course) at the ECCV, reveals how performance and very little latency can be accomplished. It would undermine the point if sign language recognition succeeded but resulted in delayed or unreliable footage, so their goal was to ensure that the software was both lightweight and accurate.
The machine first runs the video through a model called PoseNet, which measures the location of the body and the limbs in each frame. This simplistic visual input (essentially a stick figure) is submitted to a model learned to post video data of people using the German sign language and compares the live image to what it feels the signature looks like.
This basic method already produces 80 percent accuracy in predicting whether or not a person is signing, and with some additional optimization, the accuracy is up to 91.5 percent. Considering how "active speaker" identification on most calls is just so-so to say whether a person is talking or coughing, those figures are fairly decent.
In order to function without adding a new "People Sign" signal to current calls, the machine pulls a clever trick. It uses a synthetic audio source to produce a 20 kHz sound that is beyond the range of human ears, but is heard by computer audio systems.
This signal is produced if an individual signs, making speech recognition algorithms believe they are speaking out loud.
Right now, it's just a demo that you should check out here, but there doesn't seem to be any excuse why it couldn't be integrated straight into current video call services, or even as an app that piggybacks on them. Here you can read the entire document.