Using OpenSource and IBM Watson to Extract Data from Video – InApps is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Using OpenSource and IBM Watson to Extract Data from Video – InApps in today’s post !

Key Summary

This InApps.net article, published in 2022, details a presentation by BlueChasm’s CTO Ryan VanAlstine and developer Robert Rios at the Watson Developer Conference, showcasing a method to extract tagged data from videos using open-source tools and IBM Watson. Written with a technical, procedural tone, it aligns with InApps Technology’s mission to cover software development and data science trends, offering a practical guide for video data processing.

Key Points:

  • Context: With video projected to comprise 70% of internet data by 2023, automating data extraction from videos is critical, as demonstrated by BlueChasm’s solution combining open-source tools and IBM Watson.
  • Core Insight: A streamlined process using FFmpeg, OpenCV, Node.js, and Watson’s APIs enables automated, cost-effective extraction of meaningful tags from video frames, summarizing content without human intervention.
  • Key Features:
    • Process: FFmpeg extracts one frame per 30 (adjustable for video complexity), converted to JPEGs, then processed by Watson’s Visual Recognition API to generate tags summarizing video content.
    • Toolset: Leverages IBM Bluemix, FFmpeg, OpenCV, and Node.js, with optional Watson Audio and Tonal Analysis APIs for enhanced insights (e.g., emotional content from audio).
    • Optimization: Avoids costly facial recognition unless needed, prioritizing object recognition for efficiency, as in car race videos where people are incidental.
    • Error Handling: Includes checks for synchronous image processing errors to ensure reliable tag aggregation.
  • Outcome: BlueChasm’s open-source solution, available on their blog, enables scalable, automated video data extraction, enhancing applications like customer service analysis and content summarization.
Read More:   Invent for Developers – InApps 2022

This article reflects InApps.net’s focus on innovative software development and data science, providing an inclusive, practical overview of video data extraction techniques.

Read more about Using OpenSource and IBM Watson to Extract Data from Video – InApps at Wikipedia

You can find content about Using OpenSource and IBM Watson to Extract Data from Video – InApps from the Wikipedia website

With nearly 70 percent of Internet data projected to be in video format by next year, it’s clear that the task of extracting textural data from video will be critical for data engineers, and that the process will have to be automated. BlueChasm’s CTO Ryan VanAlstine, and software developer Robert Rios demonstrated how to turn raw video into tagged data during the recent Watson Developer Conference in San Francisco.

Using a variety of open source tools and a simple algorithm, they are able to extract enough meaning from videos to summarize its content. The program is able to automatically start when a new video is submitted, leaving the entire process out of human hands. The code is available on their blog.

Video is just a sequence of images, but sending all of the images through the visual recognition is prohibitive both in cost and time. The key is sending through a representative sample from the video. Picking one frame out of 30, the code sends the images to Watson’s visual recognition program which returns the images tagged. The program adds up all the tags to determine what the video is about.

The key ingredients? IBM’s Bluemix services platform, FFmpeg video conversion software, the OpenCV multicore processing library on top of Node.js, with a dash of Watson’s Visual Recognition API.

BlueChasm's Video Recognition Program

BlueChasm’s Video Recognition Program

Rios dropped the video into FFmpeg which processes the video, creates sill images picking one frame in 30, and sends those frames to a folder as jpeg images. The 1-in-30 frames is an arbitrary number, he said, picked because he knew the video was slower. If the video has a number of camera angles, or lots of people you may want to decrease the ratio to get more frames.

Read More:   A Resource Guide – InApps 2022

He prefers FFmpeg over other tools because it gives him more flexibility with the video, allowing him to add timestamps, and create metadata. FFmpeg creates still images (1 frame per second) and load the jpegs in a folder.

The resulting jpegs are sent to a newly created folder, and sets off a loop for each image which sends the image to Watson’s visual recognition API to the classify endpoint.

tumblr_inline_oge7ooltij1u5xmlp_500

The classify endpoint program does some error checking because sometimes the classifiers will come out empty or there is an error loading the tags, said Rios. Node sends the images synchronically, so that may cause errors sometimes when receiving results, so it’s good to do error checking before adding up the results. If the image comes back with an error, it’s marked unavailable.

The code that makes it work is on the BlueChasm blog

The code that makes it work is on the BlueChasm blog

The next step is to call the count method, which tallies up the tags which tell you what’s in the video.

This process can be combined with audio processing to create more useful tags. For example, a video of a celebrity will just return the celebrity name, stripping out the audio and sending it to Watson’s Audio Recognition API will determine what the video is about.

You can also send it through the Watson Tonal Analysis API which will return the emotional content of the audio, which will be useful for evaluating customer service responses or uploaded product reviews among other useful applications.

Be warned, said VanAlstine, that facial recognition is more expensive than object recognition, so you want to separate it out. In the programs they deliver to their customers, it is typical to run the video process through the object recognition, then if the video is mostly about people, then send it through facial recognition. For example, you could have a video about a car race and there are two people on the sidelines. There’s no reason to send the video to facial recognition because the object recognition data shows the video is about a car race.

Read More:   eBPF Coming to a Windows Near You – InApps 2022

IBM is a sponsor of InApps.

Source: InApps.net

Rate this post
As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Get a custom Proposal

Please fill in your information and your need to get a suitable solution.

    You need to enter your email to download

      Success. Downloading...