BookLens - Connecting Physical Books with Digital Content
I have a passion for books. About four years ago, I created a prototype called "Book Review with AR". It allows users to scan the front image of a book and instantly retrieve its reviews from YouTube.
Possible Next Steps ...
Lately I was thinking if we can extend this usecase of bringing up relevant content on demand from readers.
For instance, if a student is reading about Newton's Laws and has difficulty understanding them correctly, they can simply use their mobile phone camera to scan the subject. This would bring up relevant content directly on their phone screen, helping them better understand the topic.
This can be further extended to scan the notes to get the content like below.
Another use case would be fetching the content from screenshots,
As I write this, I'm aware that in this age of AI, computer vision has become incredibly advanced, capable of much more than what is required for this application.
BookLens is essentially a student-friendly version of Google Lens, designed to help readers gain a deeper understanding of a subject through digital content such as text, videos, and AR/VR content.
Throughout the rest of this article, I aim to trigger a thought of improving the accessibility of the online educative contents created by dedicated teachers for their students.
This is for companies founders and inventors which are looking for their next-gig.
Summary of Product
As a student, I can use my camera to identify the sections in my textbook where I need help. This will allow me to easily access a video of my favorite teacher explaining the topic I'm struggling with.
As a business, By utilizing computer vision and machine learning, our business can identify sections within books and make educational content more accessible for students.
Customer Segment : Students
Current Alternative : Online Search on YouTube / Google / Education Platform
Is it a problem to solve?
Why can't student search on YouTube for the topic and learn about it for free?
Lets first understand how would one do this...
- Type in the search query - "Need advice on this blah blah topic"
- Filter the content which you like the most from list of results
- Open the video
- Find the section in the video which talks about the given topic required an entire video scan which could be time consuming.
Assuming at every steps, student can get derailed from their goal of getting help from YouTube.
Problem 1 :
The search method described above requires a significant amount of time and effort, which seems wasteful and could have been avoided.
Problem 2:
Students often have preferred teachers in each subject with whom they feel a strong connection and want to learn from them exclusively. These teachers may not necessarily be the best in the world, but their teaching style resonates with the students.
Providing content from a student's favorite teacher in mere seconds could fulfill all their learning needs, leaving them with no further demands.
Problem 3 :
Most video search platforms rely solely on metadata, such as title, description, watch time and audience engagement, and do not utilize Automatic Speech Recognition (ASR). This approach poses a significant challenge when dealing with educational content created in local languages.
Why can't students learn it from videos directly?
The consumption of content digitally has increased drastically in recent times but majority of the students still read from books. The comfort flipping pages at will, putting markers, sticky notes, sometime folding corners are few habits which are hard to break.
inclusiveness.
How big is this problem?
Why do we need to think about this yet?
From 10K feet, I can say millions of readers but it doesn't make sense if it is not helpful for a set of people who needs it. If solution is good, lets solve it for few subjects and few students. We are helping students here, if it works. It can be extended for other types of readers as well.
What is the USP?
Education Focussed & Easily Accessible Quick Help App
This technology utilizes Automated Speech Recognition (ASR) to facilitate video search, enabling users to search for content not only in English, but also in local languages.
Integrating relevant videos with book content can yield significant benefits in terms of improving the effectiveness of the search process. Such outcomes cannot be achieved by technology alone, but instead require a thoughtful approach that considers the connection between the book's content and the accompanying multimedia.
Processing a video to extract content like at what time which subjects/topics are being discussed. If it is not English, this would be one more challenge.
Tech Challenges in Building BookLens
- Optical Character Recognition (OCR) technology can be used to recognize and extract the text from the scanned image, enabling users to search for specific keywords within the book. [Complexity - Easy]
- To understand the images from a book and enable quick searching for relevant images, we will need to use image recognition technology. Image recognition technology uses machine learning algorithms to analyze and classify images based on their content. Also a good set of images would be required to train the model. [Complexity - Medium]
- Building an Automatic Speech Recognition (ASR) system requires several steps, including data preparation, feature extraction, acoustic modeling, language modeling, and decoding. Usage of library like Google Cloud Speech-to-Text, Amazon Transcribe, and Sphinx-4. [Complexity - Hard]
- Building Search with ASR (LLM based Search) requires understanding of distributed search engineering. Using LLM and build the context of the search would be a great idea. You can also plan to use vector database but it can be costly in case platform scales up quickly.