Visual Intelligence for Fashion
– Jayguru Panda
Technology has revolutionized retail today, from enabling small/home businesses with a worldwide virtual presence to the likes of Amazon Go stores testing the corners of Deep learning and sensor fusion. Fashion still remains the largest and highest margin category for most of the retailers – online as well as offline. Recently, the space of fashion shopping has seen much innovation, coming from within enterprises and/or emerging start-ups – ranging from personalized shopping experience on mobile apps to wearable fashion IoTs; such that the space is collectively known as FashTech justifying the technology innovation that is improving the entire fashion shopping experience.
Image uploaded by user on our consumer chatbot – BotCouture and bot responding with items identified and showcasing similar products from our product catalogue. In the rightmost two scenarios in the row, the bot identified the user intent as #ShopTheLook – i.e. interested in shopping for multiple items in the image.
Abzooba is a leading start-up in the space of data analytics, utilizing state-of-the-art NLP and Deep Learning based technologies to deliver value-added services for enterprises. When I joined the XpressoCommerce team at Abzooba in early December, I had just stepped out of a nearly two and half year stint with my own start-up – Wazzat Labs – which essentially focused on visual intelligence in retail. We had worked with Fortune- 500 clientele – Target US, Victoria’s Secret, JC Penney, Lowes and closer home brands such as Yepme, Limeroad, Myntra. While our sales efforts didn’t seem to materialize, I never lost faith in the potential of the product and the market scope.
Hence, the exit (due to multiple factors) aimed for a fit that could help the core technical team at Wazzat find feet in a bigger organization. The fact that XpressoCommerce had the gap in visual engine, that we could ably fill, helped this transition, along with the support from the leadership and team members. As I understood the scope and vision for XpressoCommerce, it strives to enable enterprise retailers with smart technologies powered by state-of-art bleeding edge AI – Machine Learning, Deep Learning and anything else out there that brings us closer to the human-evolving-into-cyborgs reality, as per my favourite visionary Elon Musk.
The core focus for XpresssoCommerce from Day 0 has been to enable smart chatbots or conversational robotic agents for retailers to scale up customer engagement and innovate the conversational interface to revolutionize the traditional online shopping experience. This also means that we need to effectively make use of visual intelligence from product images in fashion categories to enable image-based search, recommendations and automatic category and attribute prediction.
Further, the goal was to expand the scope to automatically parse clothing and accessories worn/carried by a person in an unconstrained image (anywhere on the web), and serve personalized recommendations to the fashion hungry consumers. I brought in my previous start-up experience to start building the state-of- the-art deep CNN architectures on publicly available and client specific fashion data to tackle seemingly-humanly- easy but robot-level-difficult tasks such as:
- Identify the objects/people present in an image,
- Automatically identify the human body pose,
- Detect and classify the category of fashion items worn/carried by a person,
- Semantically segment out apparels, shoes and bags, in unconstrained social fashion photographs,
- Learn a discriminative cross-scenario, multi-modal representation to aid visual and textual search for fashion.
- Automatically predict style, texture, colour, fabric, etc. semantic attributes for an image of a fashion item.
When we are working on bleeding edge solutions it is of utmost importance to benchmark the performance of our systems against the best of the best out there in the industrial as well as academic research communities.
The ideal process of implementation follows a thorough research on state-of-the-art literature and making use of publicly available benchmark datasets, to apply on our use cases.
So, at every step, I put in place benchmarking frameworks relevant to the task that compares with the performance of leading research systems. For example, we considered the recently published work – Deep Fashion (http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html), that released a huge benchmark dataset for fashion visual search in shop/street conditions and contributed with novel deep neural net architecture that handled a set of multiple tasks learnt jointly – finding the optimum performance across the tasks. We used a slightly different architecture and our own datasets in combination with DeepFashion and learnt a feature extractor that generate discriminative image representations.
We benchmarked our performance against this research group, before deploying in production/test environments. Similar process is followed for separate tasks/modules that requires taking cues from academic/industrial research forums and fitting the solution to our use-cases.
The XpressoCommerce team recently launched BotCouture – a consumer chatbot, currently launched on Facebook Messenger – that would showcase our enterprise use cases, as well as strive to innovate. In terms of visual intelligence, three consumer query intents were identified:
- View Similar – Consumer is browsing a section/set of products and interested to discover more products similar to a particular item.
- User sends an image (to the bot) of a fashion item to discover shopping options
- User sends an image (to the bot) of an entire fashion look with an intent of #ShopTheLook This would be particularly useful to discover celebrity outfits and getting inspired from social fashion.
We created a microservice architecture for visual search, that can handle one or more of the image recognition tasks as per the requirement and addresses the above three use cases accordingly. It was also understood that fashion hungry consumers who prefer mobile shopping would get inspired from photos on social photo sharing apps like Instagram, Pinterest, Lookbook, and would like to query our bot for visually similar items.
We put in place a mechanism that when a user selects the url/link from sharing options within the external app, and sends that as a message to our chatbot, we identify the image in the link and respond back with visually similar product matches. Here is a link to an Youtube video (https://youtu.be/FWFAq9C9rhE) that the team has created – showcasing and in-brief comparing the visual intelligence of our chatbot – BotCouture, with another consumer chatbot – Mode.ai.
An example of user sharing an Instagram image of a popular men’s fashion blogger and the bot identifying the items worn by the person in the image and showing visually similar matches from the catalogue.
While, we are at an early stage of this product and visual intelligence in particular, and only time, clients and real users would judge how BotCouture performs, I would appeal all Abzooba members to sincerely try out BotCouture and critically review it, so as we can keep on improving upon our offerings and delight consumers with a game-changing experience.