We’re working towards a world where chaotic input easily gets transformed into workable, intelligent input, speeding up processes and creating a more efficient working environment for all.
But first things first – what do we mean by chaotic and intelligent “input”? And how does chaotic data or documents become intelligent? In this post, we’ll share some insights on these topics, along with some common challenges that rise up in the process.
1. Chaotic input vs. intelligent input
Whether we’re talking about digital or paper-based documents, there are typically three types of input: structured, semi-structured and unstructured. Structured documents usually have the same types of content in the same layout, and this “structure” is stable for long periods of time – a passport, for example.
Invoices, for instance, are often semi-structured, on the other hand. This means they might look similar in layout, but there can be several crucial differences.
In unstructured documents, there’s no recognizable pattern on the type of information or format. Handwritten papers are a typical example here.
It’s common to refer to unstructured content as “chaotic input” – and structured or (semi)-structured input as "intelligent input". As today more than 80% of business data is unstructured, organizations are looking to transform all this chaotic input into structured, intelligent content. Without this transformation, vast amounts of enterprise knowledge get lost.
2. How to bring structure to unstructured and semi-structured input
Despite all digital workplace trends, and moving towards a paperless office, there’s still a lot of paper-based information within organizations. Unstructured content that needs to get digitized and structured before it can be used in business processes. But: Where does the digitization process start – and how does it take place?
We start by using OCR engines to digitize the content. Next, we need to decide in what kind of work process it needs to be put. Obviously, different types of input need different processes done on them.
The workflow for unstructured input is most likely going to need a manual processing step. Even though systems are getting better at understanding different language aspects, this remains costly and difficult. With manual processing, a user decides what information needs to be saved and what kind of document he or she´s handling.
When working with semi-structured input, most information is available on the document to digitize or process it. However – when the information is insufficient, unreadable or unusable, it does require an operator to decide what to do with the document.
Structured documents are usually comprised of usable information. An optional manual check can come in for user-error checks and as extra fail-safe.
3. Challenges in the process
Even when working with structured documents, additional checks get built in for various scenarios. One of which is user made errors. Imagine somebody spilled coffee over an important identification number, there’s a fold covering an important piece of information – or maintenance of the scan hardware has been lacking. Did you know regular dust can make scans unusable?
Another scenario is data loss. It may occur when unstructured content gets digitized and the OCR engine cannot read certain letters, such as handwritten ones. In other words, one factor cannot be missed in the process of digitization: the human.
Jerry Rosenau is a Junior Business Consultant at Amplexor, based in Eindhoven, The Netherlands. Having had an internship in the content capture team after college graduation, Jerry now focuses on business process analysis and supporting the sales team on advising customers about the best possible capture solutions.