Today, we’ll be taking a little different approach to the standard BostInnovation interview.  MassChallenge winner, 3playmedia is a company that specializes in what they refer to as “intelligent transcript.”  They don’t just take your media and transcribe it, they make it actionable, and come to life. After speaking with Jeremy Barron and Chris Antunes, co-founders of 3playmedia, we decided what better way to show readers what 3playmedia is all about than to use their technology to produce this interview. We recorded the interview and when it was over the guys gave me access to a demo account. It was pretty remarkable to see their product in action. Every single word of the interview was actionable.  I could click on a word in the transcript and it would direct me to the exact point in the audio, making it seamless and easy to navigate through the piece. I was also blown away by the accuracy of the transcript, which required almost no post-editing on my part. Below you’ll see the exact transcript of our entire interview – in it’s completely natural state produced by 3playmedia’s technology:

3playmedia Interview

CHRIS ANTUNES: Alrighty. Here we go.

BOSTINNO: Alright, so for those who might not know you, can you tell me a little bit about 3playmedia’s history, and what exactly the company does?

CHRIS ANTUNES: Sure. So we have a process that leverages speech recognition technology as our starting point. We create high quality transcripts and close captions. So, using speech recognition as a starting point allows us to create transcripts much faster and thereby reduce our costs. In addition to driving costs down, this process creates time-synchronized transcripts, which means that each word in a transcript knows exactly where it lives inside the video’s timeline. This enables you, not only to find the video that contains your keyword when you’re conducting the search, but also to find exactly where in the video that specific keyword appears. So these time-synchronized transcripts also have many different applications, including closed captioning for the hearing impaired at a dramatically lower price than traditional, and video archive search, which is a centralized tech repository of all your videos, which enables users to search for keywords across the archive to identify, play back, save and share relevant segments of the videos.

BOSTINNO: Very cool.

CHRIS ANTUNES: Does that all make sense?

BOSTINNO: Totally. Go into a little bit more detail about the interactive transcripts you described.

CHRIS ANTUNES: Sure. So the interactive transcripts are essentially stored transcripts. Each word inside the transcript has a time stamp associated with it. Which allows you to do a couple of things. One, to conduct searches within that transcript, and quickly find that specific keyword within the transcript that you searched for. And then when you click on that keyboard, navigate directly to that point in the video.

JEREMY BARRON: Just to jump in, just image you’re looking at a transcript, that is below a video. As that video plays, every one of the words that are being spoken would be highlighted in that transcript. At the same time, you could also click on any point in that transcript, and the video would automatically sync and start playing right from that point.

Some other new things that result from having that time data associated with each word, is the ability to select text and click a feature, and just through one click, share that link through your Facebook or Twitter to a friend, who then could click on that link. And instead of just taking them to that video, it would take them to the exact point in the video. And start playing from the point of the text that you highlighted. And stop playing at the end of that text section. Additional functionality that the time synchronized transcript enables is this concept that Chris was mentioning of clipping and search. So imagine you have 100 hours of video. You’re a market research firm, and you’re trying to identify a couple key salient points. You could really quickly search that entire archive of 100 hours through keyword search. Locate the exact clips that are relevant to what you’re trying to prove. Select those texts, and then add them basically to your customized clipboard. And then in a matter of a few minutes, you could create 10 or 15 relevant clips in a custom playlist that were identified from hundreds of hours of video. And then you can share that custom playlist again– like I mentioned –through a URL, or put it in a PowerPoint presentation or send it over email.

BOSTINNO: Excellent. Talk to me a little bit about the company’s back-story. You touched on it a little bit., but what problem did 3PlayMedia really set out to solve. Then how did you guys meet up and form 3PlayMedia?

JEREMY BARRON: So we all met up at Sloan, at business school at MIT. But originally the problem we set out to solve was, how do you make closed captions cheaper? Closed captioning traditionally is very expensive. And our original goal was setting out to drive these costs down. Now, how did we stumble upon that issue? How did we get interested in that? That relates to one of our other co-founders Chris Johnson. Before attending Sloan, he had worked at MIT Open CourseWare, which is MIT OCW. And if you’re not familiar with OCW, it’s basically a web-based publication of virtually all of MIT’s course content. Including hundreds if not thousands of hours of video, recorded lectures. So their goal is really to get as much video and coursework online, free, accessible to the public as possible.

Now, part of their funding requirements is that they make a published video of courseware accessible to the hearing impaired. And in order to try to do that through closed captions, they approached standard providers and found that the cost of closed captioning was just prohibitively high. And they actually got in contact with Chris Johnson and said, is there any way you can look into this? And see if you can create a process, an innovative solution to drive the costs down. So typical captions are really expensive, because the transcript has to be created manually. And then each caption frame has to be individually synchronized, again manually, to the source video file. And obviously, that takes a lot of time. And it’s really costly. On the alternative side, you could use pure speech recognition or pure automation, which would be cheap. But the quality is not really there to meet all the captioning requirements. And that’s especially true in the cases where you have difficult accents or a really specified terminology. Things that would really degrade the quality of the speech recognition output.

And so, our solution was really to create a hybrid approach, drive down the cost through the use of speech recognition. But then keep quality high by having humans word for word review the speech recognition output. And edit it in our customized, proprietary editing platform.

BOSTINNO: So what has been 3PlayMedia’s response to the rise in voice-to-text products, such as Nuance, Dragon, Naturally Speaking?

CHRIS ANTUNES: Sure. So obviously this is a question we think about a lot. Jeremy actually touched on our process a little bit. So this might be repeating a little bit of that. But speech recognition is an input, in fact the first input into our process. So every video or audio file we receive, we first pass it through an automatic speech recognizer, to create a rough draft of the transcript. This draft is then edited by 3Play trained editors, who actually review the entire transcript word for word, to ensure high quality. And of course they can do this much quicker, because they have a speech recognition first draft, as opposed to having to start from scratch. So as a result, we’ve used speech recognition solutions, like the Nuance one you described, as more complementary, than as a competitor. As speech recognition improves, our rough draft improves, and our costs are reduced even further. So that being said, we obviously do recognize there are circumstances in which speech recognition alone is not sufficient. But we are focused on markets where accuracy is critical. And this simply is not the case.

3playmedia co-founders Chris Antunes (left) and Jeremy Barron (right)

BOSTINNO: Let’s go back to the human side of 3PlayMedia’s involvement. What’s the ratio that humans are involved in terms of the accuracy of the product? Is it most of– Oh sorry, you understand.

CHRIS ANTUNES: Yes, so let’s just take the transcription side of it. So traditionally, transcription in a completely manual process, where you start basically from scratch with a word document and you use a foot pedal, that might take on average five hours or even more to transcribe one hour of audio content. So in our case, the speech recognition first pass gets you 60 or 70 percent of the way there, in terms of the accuracy of the transcript. And when our editors go and clean up the transcript, they can typically finish that transcript in about two hours or three hours, depending on the quality of the audio recording and therefore the quality of the speech recognition we receive back as a first draft. So from a time perspective, the majority of the time is still in the review conducted by humans. Because of course, speech recognition is near real time. One to one. If you give a one hour file, we can give the speech recognition back in an hour. So the majority of the time still is on the human side, but we’ve significantly reduced the total time it takes to create a high quality transcript.

BOSTINNO: Sure. How many people do you have right now working at 3PlayMedia?

JEREMY BARRON: So we have right around 50 contractors who are actually in charge of working to clean up that speech recognition output. And then we have internally about eight people in management and development and such.

BOSTINNO: Very cool. So getting back to the question I was asking before, who would you consider 3PlayMedia’s top two direct competitors? And who are some of their indirect competitors?

JEREMY BARRON: So, I think the competitors break down by market, typically. On the direct side, I would say Automatic Sync Technologies is a direct competitor, specifically in the education market for web-based closed captions. Referring to the interactive transcripts which we described before, I would say that SpeakerText is probably a direct competitor. They’re doing sentence to sentence synchronization. But the interactive transcript market is really new, and fairly nascent. So I would say that there really isn’t any company who’s established themselves as the market leader at this stage. And then other direct solutions would just be cheap offshore transcription solutions. So where we’ve used technology and process, other companies have tried to reduce the costs by going to lower cost labor either in India or the Philippines.

And then indirect competitors, I would say that– as we described before –people who produce pure speech recognition, they would be indirect competitors in markets with customers who really require speech recognition output, who didn’t require that it be extremely accurate. Where 60 or 70 percent accuracy rate was sufficient. And then in the education space, an indirect competitor is always the internal resources. So universities are very frugal, and they typically will use student labor or interns wherever they possibly can, to do the actual transcription and manual synchronization on captions. But as we found in this process is that– it’s one of those things where you feel like it makes sense from a financial perspective to try to leverage your internal resources. But if it’s taking an internal resource 10 hours to transcribe one hour of audio, you can quickly lose track of how much time and resources are being consumed in that process.

BOSTINNO: Absolutely. You did touch a little bit on this, but drilling down with a little bit more detail, how does 3PlayMedia differentiate from some of the competition? What makes you unique and stands you apart?

CHRIS ANTUNES: Sure. So on the product side, as I explained earlier, the core asset we produce is this word to word time synchronized transcript. And then we take that asset and there are many derivative products of it, including closed captions and this interactive transcript, and varied applications of that. But this word to word time synchronized transcript is very hard to create. And most of our competitors, like Jeremy mentioned, created an acceptance level. Or in the case of pure speech recognition solutions, it obviously doesn’t have the accuracy that our transcript has. So the highly accurate word to word time synchronized transcript is certainly a differentiation.

On the process, obviously as we’ve mentioned, our costs are driven down by the efficiency gains in our process. And also something that we didn’t mention, this workforce of editors that Jeremy described….we actually have a unique process in how we determine how jobs are allocated to them as we receive them. So we actually tag each of the customers and also the individual files that come in on a number of different dimensions. And similarly, our editors over time, as we learn more about them, and review the quality of their work, we also tag them. An example of this would be a certain domain expertise, like financial, or medical, or legal. So as each file comes into the system, if that file is tagged as a medical file, then we can dynamically permission that file to only be available to editors of ours who were also tagged as having medical expertise. So these sorts of dynamic job matching from the internal process perspective ensure much higher quality transcripts.

BOSTINNO: Very cool. So on the marketing side, how are consumers finding out about 3PlayMedia? Is it word of mouth, search, pay-per-click, or events?

JEREMY BARRON: I would say all of the above. And it’s contingent on the size of the opportunity that’s presented itself. So just followed opportunities that we’ve found through pay-per-click, Google AdWords, and inbound marketing efforts. We’ve done a lot of SEO optimization. We keep refreshed blogs. We send out monthly newsletters. And we solicit leads also by putting coveted content such as detailed pricing information behind a form, so someone would have to fill it out to access that information.

I’d say medium and larger opportunities mostly stem from conferences and trade shows. We were recently at EDUCAUSE out in Anaheim. And obviously word of mouth and articles and also networking is obviously huge. And we’re doing some outbound sales, cold calling. But it has been somewhat effective, but there’s a limited amount of resources that we put into that.

BOSTINNO: Sure. Congratulations on being one of the $50K winners in the recent MassChallenge competition. Can you tell us a little bit about the experience of being part of Mass Challenge and some of the key individuals that helped you, and how it shaped and impacted the company?

JEREMY BARRON: Yeah, obviously being part of Mass Challenge was an amazing experience beyond the financial award, and the subsequent marketing rewards from being a finalist and winning. We also were connected and obtained incredible advice from all sorts of individuals. We even were able to connect with and find an engineer who is on track hopefully to join our company in the near future. We had unbelievable exposure also to other entrepreneurs. The fact that we were given workspace– although we didn’t use that as much as some companies, because we actually have our own office space in Porter Square –just the ability to interact with other young entrepreneurs who have a lot of energy, and who are out solving issues, is tremendously valuable. You can bounce questions off them and they can bounce questions off you. And you can learn from them. And there’s a lot of camaraderie that’s associated with that. In addition, we’ve been connected to numerous sources of funding, VCs, private equities and angels. We raised an angel round just about a year ago. And we aren’t actively pursuing funding. We’re always interested in having those conversations. Those are really valuable and just being connected with those individuals. It was just an unbelievable experience.

BOSTINNO: My final question would just be, moving forward what are 3PlayMedia’s plans for the future?

CHRIS ANTUNES: Sure. A question we’ve gotten quite a bit from our customers, particularly in the area of accessibility has been translation. Right now, on the interactive transcript, one thing we failed to mention earlier when describing it, is that there’s an automatic translation option now which uses a Google translation API which can translate both the internet transcript and the closed caption files on the video on the fly into many, many different languages. Again, that’s an automatic solution powered by Google translation technology. But we’re, right now, in the process of launching a professional translation option, which of course will be much higher quality, from English to French, Spanish and German, with many other languages to follow soon after. That’s one of the immediate big projects we’re working on.

Another– which again was dictated by many customer requests –is a shorter turnaround option. Right now, our standard turnaround is three to five business days. We have a rush option which is 24 hours. But we’re thinking about ways to turnaround files in a much shorter amount of time, four to six hours after receiving the audio file. Things like that. And of course that’s of a lot of value to customers who have particularly time sensitive audio or video. For instance, conferences or financial calls. Things like that. So those are two of the big projects we’re working on right now.

BOSTINNO: Very cool. Well Jeremy and Chris, thank you guys so much for your time today…

CHRIS ANTUNES: Sounds good. Sean, if you don’t mind, I actually realized I forgot to mention one thing.

BOSTINNO: Please.

CHRIS ANTUNES: So, we talked a lot about the technology. But I realize we didn’t really talk about the markets very specifically that we address with the technology. Of course, we’re looking to address any market where we think that there’s a lot of audio and video, or in our case, time synchronized transcripts could have a material and immediate impact. So the four we’re focused most on now are: education for the purpose of accessibility and closed captioning. We can bring a solution that’s often much cheaper than the alternatives out there. Market research. Jeremy touched on this a little bit in the example about the use of the interactive transcript. If you have hundreds and hundreds of hours of focus groups and one-on-one interviews, things like that. Using our tool can be a big productivity enhancement, and help you search and discover new insights you wouldn’t have otherwise. Video editing. You can imagine how difficult it is if you have hundreds of hours potentially of raw video footage, and you need to edit it down to a 30 minute show or something like that, how challenging that can be. But with our solution, you can search really quickly and efficiently for keywords and topics that you might have remembered from the filming. Again, we can bring significant productivity gains to that area. And then finally, online video archives. The JFK Library is an example of someone we’re doing a project with now, that has a large, large repository of really high quality, high visibility video that is hard for users to access. Unless they know specifically what video they’re looking for. Here, you can search for a specific key term, look across that entire archive, and bring that exact section of the video, that exact 10 or 30 second clip you’re looking for. You can bring that to the user, immediately based on a quick keyword search.

So those are the four areas where we’re focused now. Although after talking to you, I can see that even interviewers and things like this, of course there’s an obvious value there, using this type of service to be able to help write articles and things like that. So I think we’re always constantly discovering, even accidentally new potential markets.

BOSTINNO: Absolutely. Very good. So, before I sign off, anything else that you might want to tell the readers at BostInnovation?

JEREMY BARRON: I think that’s pretty much it. We definitely covered a lot of stuff. And we really appreciate the opportunity to have this conversation with you. So thanks so much. And we’re excited for our history and what we’re doing now for your readers.

BOSTINNO: Absolutely. I think it’ll make a great piece.

To learn more about 3PlayMedia follow them on Twitter and Facebook. Check out an example of another interview here.