Date and time are now extracted using Taggun's new engine. We have also enhanced features like enriching the date result with EXIF metadata in JPEG files.
We implemented an automatic feedback and training pipeline to improve Merchant Name accuracy using location data from Google Places.
We implemented an OCR augmentation in the new engine to specially handle common OCR mistakes; for example 2S Sept 2020 to 25 Sept 2020
In addition to Receipt and Invoice, we added a new context called Fapiao to better handle special Chinese official tax invoices in the new engine.
Merchant name is now extracted with Taggun's new engine.
Using the new engine, we will now automatically detect your document format (eg: receipt, invoice) to process them separately to achieve better results.
For companies that rely on the time component of dates, we have improved accuracy for date/time extraction especially if the time component is on a separate block in the receipt.
For receipts coming from Sweden, we have improved the extraction of Swedish company names as a merchant name entity.
We addressed a bug where future receipt dates were selected in certain scenarios.
We fixed an annoying and intermittent issue that occurred in our continuous deployment process.
We've improved the calculation of overall confidence level to decay the scoring if the receipt image is skewed or has poor OCR text quality. Our aim is to recommend the receipt to be marked for review if confidence level fall below 0.8.
To improve the accuracy of our results, we now have a more robust way to rotate the image before sending the image for OCR if the image is rotated at 90°, 180°, 270° degrees.
At TAGGUN, safety and security is paramount. We used a top security consulting company in NZ to perform security penetration tests and source code reviews for us. No critical findings were found. Phew!
We resolved a few smaller security findings immediately and made a significant improvement on our AWS security practice using bastion account.
Entities: payment type and receipt number are now extracted using the new Taggun's engine.
To prevent Server Side Request Forgery, we have stopped accepting IPV6 as file URLs in all our API endpoints.
We also improved the processing of merchant names with enriched data from FourSquare.
Fixed a bug that causes intermittent language code errors.
Gracefully handled PDF files that cause request errors.
Added fast in-memory BM25 text search for more accurate and faster identification of country in the near parameter.
More tinkering on our receipt scanning accuracy benchmarking tool, so we can streamline the process to ingest data from the feedback API.
We have also improved the payment types extraction to support a wider variety of payment processors.
We improved the way we handle receipts from France and Hong Kong. Particularly, to improve the accuracy level for the total amount.
We added a new feature to return the merchant postal code when detected on a receipt. This is great if you are using TAGGUN to validate receipt as proof of purchase for digital loyalty campaigns.
We have added the ability to detect enhanced merchant address work for Australian market.
We have upgraded our third-party dependencies to avoid any security vulnerabilities. A nice side effect is we now have a brand new Swagger UI for our API tester and documentation.
New feature to detect city or country name on a receipt. This is great if you want to pre-populate the location field of the expense for your app.
For those of you who like to have separate billing for different sub-accounts under one API key, please let us know. We have now added the ability for you to do just that.
We added the ability to extract payment type (Eg: Visa, MasterCard, or Cash) from the receipts.
Fighting spammers. We had to implement some high-level protection against abuse on our demo site. Annoying cat and mouse game. But, fun stuff.
Occasionally, the PDF rendering can cause some issues in our API. We have fixed this with a fork of Google's PDFium library for node. And we have released it as an open-source package at https://www.npmjs.com/package/@taggun/pdfium
We rolled out a new model to detect location from the text of the receipts and invoices.
Previously we relied on the "near" parameter of IP address to detect the location... Now, we have improved the algorithm to better detect the city and country name from the receipt.
We use a list of city and country names from https://www.geonames.org/countries/ with a population above 150,000 to train the NER model.
Previously, if the request header content-type is not set as application/json, we would still process the content as JSON type. We're now performing stricter checks on the content-type request header for safety.
For those who love to get the accurate time from the receipts, good news, you can now set "extractTime" to true and we will return the time if we can detect it from the receipt.
Occasionally, a valid PDF file will be deemed as unsafe by our security check... Now we allow some tolerance for PDF files with an unexpected signature in the file bytes.
For those who serve Australian customers, Australia Business Number (ABN) number is mandatory in Australian invoices. We now integrate with the Australian government's API to return enriched data from the ABN registry.
Added new feature to detect location from the text of the receipts and invoices.
We received feedback that the 418 error status code is non-standard and causing their application to retry the requests if the file is not a receipt. So, we now return success code 200 with empty results for each property if there is no text detected on the receipt.
If the file type is PDF, and content-type is "application/octet-stream" (a common mistake made by other applications), we will now try to handle this as a PDF file gracefully for you as application/pdf content-type.
If we detect unmatched content-type and file type for security check purposes, we now return the 400 error status code.
Added new feature to integrate with Australian Business Number (ABN) and return enriched data from ABN registry.
Improved error handling for files that are not receipts
Improved handling of PDF files with application/octet-stream
content-type
Improved support for multiple languages with new Natural Language Processing and Named Entity Recognition packages and processing.
Improved date accuracy for US date format
Added new feature to extract QR Data when QR barcode is detected on the receipt or files.
Added new feature to extract special information for official Chinese Fapiao Invoices.
Improved accuracy on totalAmount
to avoid picking up "total cash paid"
Improved accuracy for invoiceNumber
Improved PDF processing to have a wider support for non-standard fonts.
Improved request performance for PDF files processing.
Added strict request parameters validation for MD5.
Added itemCounts
extraction to detect the number of items in a receipt.
Added support for Hungarian receipts.
Improved overall accuracy extraction with better detection of space and symbols in the OCR text.
Improved date extraction by using keywords to pick up the correct invoice date. And to avoid dates such as: delivery date, due date, period date and etc.
Improved stability and performance by tuning the memory usage of containers.
Improved request response time with caching of merchantName
. The cache will refresh every 24 hours.
Updated all third party dependencies to avoid security vulnerabilities.
Added support for Romanian receipts.
Improved invoice number extraction by using filename.
Improved image preprocessing optimisation for OCR text.
Added ignoreMerchantName
parameter to avoid extracting the invoice recipient's name as the merchant name.
Improved invoice number extraction.
Added a new server in UK.
Improved amount and tax extraction for Nordic receipts.
Improved tax extraction for GST in Australia and New Zealand.
Upgraded 3rd party library dependencies.
Improved API performance improvement by reducing memory usage.
Upgraded all software dependencies to the latest stable version.
Improved API performance by reducing up to 10% processing time.
Deployed a new server in the UK region to handle the increased load in Europe.
Inspect and validate uploaded files with allowed file types.
Added near
parameter to accept a geo location to validate and enrich the merchant details.
Added currencyCode
property to totalAmount
and taxAmount
which returns three-letter ISO currency code.
Removed redundant endpoint /api/article/v1/verbose/file
.
Removed redundant endpoint /api/locale/v1/location
.
Added merchantCity
merchantState
and merchantCountryCode
properties to be returned when merchantName
or merchantAddress
is found.
Added a new endpoint /api/account/v1/merchantname/add
to help improve the accuracy of merchant name for your account.
Improved accuracy for merchantName
and merchantTypes
with improved algorithm and merchant information data provider.
Added a feature to extract and detect spaces as thousand separators for receipts in Belgium. This feature is not enabled by default; email to request this feature to be enabled for your account.
Improved accuracy for receipts from Brazil.
Added feature to extract VAT numbers from Belgium. This feature is not enabled by default; email to request this feature to be enabled for your account.
Improved accuracy for receipts from Belgium
Added a new endpoint and data centre https://api-uk.taggun.io in the UK, providing lower latency and faster throughput for customers in the Europe region.
When API request encounters an error, the endpoint now returns HTTP status code 400 with a short description of the error.
Improve accuracy for date formatting detection and extraction.
Released an alpha version of the invoice number and IBAN (bank account number) extraction under the property of entities
.
Improved line item description and amount detection and accuracy.
Increased accuracy for extracting totalAmount
using subset sum algorithm.
Released an alpha version of line item description and amount under the property of lineAmounts
. This will detect the description and the amounts found on the receipt.
Enhanced merchantName
, merchantAddress
and merchantType
to feed-forward production data as training data for Machine Learning to improve accuracy.
Increased accuracy by rotating the image 90 degrees if receipt is detected to be landscape.
Added support to extract date with without year information, e.g. JULY'24. The engine will assume the year is the current year.
Converted to use PDFium as the PDF reader, reducing 70% on PDF conversion time.
Reduce time to load file for /api/receipt/v1/simple/url
by using in-memory buffer.
Added /api/receipt/v1/match/file
endpoint to recognise if an image of a receipt contains certain keywords (supplied by the client) and return result indicating whether the image is a likely or unlikely match to the supplied keywords.
Added optional request parameters to support languages other than English for OCR. Currently supporting: en, es, fr, jp, he, iw
Added a new property numbers
in verbose mode to return any detected numbers on a receipt or article. This is useful for those who wants to detect any reference number or codes on a receipt.
Added line amounts in the verbose
results. Line amounts are 1 or more amounts that were found on the receipt that accurately sums up to detected total amount of the receipt. This is implemented using the subset sum algorithm. It also greatly improves the confidenceLevel
calculation.
Publish this help documentation for Taggun. Even though Swagger documentation is great for up-to-date information for the APIs, there are still other aspects in Taggun that I would like to share with everyone.
Added /api/article/v1/verbose/file
endpoint to perform OCR other than receipt. This is useful to just do a simple image-to-text dump and capture any metadata like dates and amounts that are found on the image.
Modify receipt endpoints and the structure of the response uniform and easier to understand.
Added a dump of all detected amounts on a receipt to meet a feature request from a Reddit user. Follow the link to see the Reddit post.
Started Taggun blog. Apparently blog is the most effective tool for digital marketing?!!. Check it out --> https://blog.taggun.io
Woohoo... Taggun has a bot!! His name is Marvin and he is Taggun's admin assistant. You can ask him to do a few basic administrative tasks now. Get to know more about Marvin.