Recent updates
We offer high-level descriptions of what's new and any bug fixes we developed in our platform over time.
April 2023
Exciting news - Our new feature Receipt Validation for customer engagement companies that are using our services to validate receipts as proof of purchase.
We are partnering with two incredible clients to launch the receipt validation feature with extensive collaboration, testing and feedback.
Our cutting-edge technology is now capable of extracting Japanese receipts - thanks to the power of OpenAI!
More breakthroughs in accuracy when it comes to merchant names!
September 2022
Improved accuracy of the Total Amount field as we started seeing more diverse receipt formats coming through.
We've re-adjusted the confidence level threshold for OCR text to boost the precision and robustness of the model.
After some delay due to many iterations of experimenting and testing, the line items feature has recently seen a major breakthrough. We're excited to launch the feature publically in the coming weeks.
As we and our clients take on more global clients, we've made improvements to the currency code and location extraction.
July 2022
We’ve made some amazing improvements with merchant name and location accuracy by taking advantage of a new Named Entity Recognition (NER) model.
We also improve the way we process and extract information from invoices. This is to allow the engine to tackle some of the more complex invoice formats.
Recently, the OCR is noticeably robust in recognising handwriting. This is great news for handwritten receipts, especially for restaurant receipts in the US, where the tips and total amount are commonly written by hand.
May 2022
We’ve released a classification engine to categorise transactions. This is special because the labels are customisable and are based on each client.
We have made progress to improve line items - especially the accuracy. It's particularly to reduce any duplicates and improve the detectability of the line items for longer receipts. E.g., The grocery receipts in the USA.
Location data on the receipt is important for our customers. We have improved the way we detect and verify the location accurately.
Some of our customers use Geonames ID to identify and map the location accurately. We have also improved this feature so that it chooses the right location.
Detection of correct date format has also been improved specifically for receipts from the USA, where the date format is primarily month first.
Internally, we built a context classifier to differentiate receipts, invoices, and mobile screenshots. This is to fine-tune the results to achieve a better accuracy rate. We have improved the context classifier accuracy with new training data sets.
March 2022
Product Line Items performance has been improved - the response time has been reduced by 10-15%.
The Australian Business Number verification process is more robust - as Taggun is now hosting the database in-house and no longer relies on the 3rd Party API.
As requested by a customer, Business Type Classification development is underway and is targeted to be released in June. This will help Taggun customers in Expense Management Industry to automatically classify their customer's spend.
February 2022
Last cycle was a very short one and the focus for this one is Line Items. We will have a lot more to report on that soon!
Get in touch if you want us to enable line items for you - and please give us feedback.
December 2021
Made further accuracy and performance improvements for line items extraction for beta customers.
Improved the way we perform OCR on PDF files, especially for the non-standard PDF format files.
Investigated the threat of Log4j security vulnerabilities in our system to ensure it doesn't impact our customers.
October 2021
For our cloud customers, we upgraded to use AWS Global Accelerator. We measured over 30% faster time-to-first-byte in some regions. On top of that, cross-region failover is now instantaneous, (without having to wait for DNS to propagate) to minimise API downtime.
Improved accuracy for the total amount, especially for receipts in the Europe region.
Improved payment type to detect new payments types in the Asia Pacific region, and better detection for Visa, MasterCard and American Express.
Performance just got better. We reduce the average API response time by improving our parallel processing to squeeze more out of the few seconds we have for the receipts.
Available for Beta Testing ✌️ - Line item details (name, quantity, unit price, and total price) of products detected on a receipt. Please respond to this email if you would like to try it out now.
August 2021
Migrated our cloud customers to use Microsoft for the underlying image-to-text OCR, offering better accuracy in all languages, including better support for handwriting.
Our API monitoring just got better. We set up better APM monitoring tools for our API so we can better respond to any delay or spike of traffic for our system.
We improved the merchant name model to avoid picking up some generic stop words.
July 2021
Fixed a security vulnerability. We now only accept HTTPS scheme when domains are passed to the URL parameter. This is to avoid DNS rebinding attacks.
We did the groundwork to seamlessly migrate the underlying image-to-text OCR to use Microsoft OCR service.
We also improved the detectability Geonames location for city names with accent.
We fixed a few regression issues with Invoice number extraction.
June 2021
Moved a cloud-hosted server from the UK to the France region to serve our customers in the EU since Brexit. The endpoint https://api-uk.taggun.io will be removed.
For our Australian customers, we fine tuned the ABN number extraction algorithm to improve the accuracy rate.
Cities with shorter names (Eg: Lyon) were sometimes missed, we fixed a weakness in our GeoNames city detection to improve the detectability of these cities.
We improved the accuracy for mobile screenshots of receipts from ride-sharing apps in Asia Pacific.
We've tinkered with the receipt number detection to improve the accuracy for receipts in Hong Kong.
April 2021
For UK receipts, we have implemented HMRC services for merchantVerification, since UK VAT validation service no longer works directly with VIES (EU initiative).
We doubled the size of our cloud hosted servers to improve faster response times and stability under load.
Continued to make further improvement on merchant name accuracy, which we are happy to have received good feedback from our customers.
Improved the handling of receipts from Singapore, especially on the currency code and amounts.
March 2021
For receipts from the UK, we have implemented HMRC services for merchantVerification, since UK VAT validation service no longer works directly with VIES (EU initiative).
We doubled the size of our cloud hosted servers to improve faster response times and stability under load.
We continued to make further improvement on merchant name accuracy, which we are happy to have received good feedback from our customers.
We improved the handling of receipts from Singapore, especially on the currency code and amounts.
February 2021
We've introduced a multi-tax feature. We can now detect and extract multiple tax rates and tax amounts (if available) on a receipt. Please reach out if you wish to enable this feature for your account.
We fixed a peculiar and intermittent bug with PDF file processing that causes 504 error response code to our customers.
We resolved reported degradation with the ignoreMerchantName parameter. We found and fixed a weakness in our model when handling merchant names with lower confidence level.
We improved our Swagger API documentation, so that it is easier for our new users to learn and use the endpoints.
We've introduced a multi-tax feature. We can now detect and extract multiple tax rates and tax amounts (if available) on a receipt. Please reach out if you wish to enable this feature for your account.
January 2021
We made major improvements on merchant name extraction to allow better detection of shop names on receipts, especially for French receipts.
Fixed a minor weakness in our detection for date format.
We've also improved the quality of the OCR text content.
December 2020
For our European customers, we added a new feature to detect VAT Tax ID and retrieve the up-to-date VIES merchant information from the official EU website. When you make a requests to the verbose endpoints, the new property entitiy.merchantVerification will be available if we detect a match on the receipt or invoice.
We can also now re-align the text on curled up receipts to improve our analysis of the text.
We fixed a bug where some street addresses are being returned for French receipts.
November 2020
We've improved the tax amount extraction for Australian and New Zealand receipts.
We've further improved the accuracy of date and time extraction to avoid picking up the wrong dates on utility bills.
We continue to monitor the accuracy rate and improve the accuracy of merchant names for each of our customers.
By the way, we renewed our servers SSL security certificates to keep things safe and sound.
October 2020
Date and time are now extracted using Taggun's new engine. We have also enhanced features like enriching the date result with EXIF metadata in JPEG files.
We implemented an automatic feedback and training pipeline to improve Merchant Name accuracy using location data from Google Places.
We implemented an OCR augmentation in the new engine to specially handle common OCR mistakes; for example 2S Sept 2020 to 25 Sept 2020
In addition to Receipt and Invoice, we added a new context called Fapiao to better handle special Chinese official tax invoices in the new engine.
September 2020
Merchant name is now extracted with Taggun's new engine.
Using the new engine, we will now automatically detect your document format (eg: receipt, invoice) to process them separately to achieve better results.
For companies that rely on the time component of dates, we have improved accuracy for date/time extraction especially if the time component is on a separate block in the receipt.
For receipts coming from Sweden, we have improved the extraction of Swedish company names as a merchant name entity.
August 2020
We addressed a bug where future receipt dates were selected in certain scenarios.
We fixed an annoying and intermittent issue that occurred in our continuous deployment process.
July 2020
We've improved the calculation of overall confidence level to decay the scoring if the receipt image is skewed or has poor OCR text quality. Our aim is to recommend the receipt to be marked for review if confidence level fall below 0.8.
To improve the accuracy of our results, we now have a more robust way to rotate the image before sending the image for OCR if the image is rotated at 90°, 180°, 270° degrees.
At TAGGUN, safety and security is paramount. We used a top security consulting company in NZ to perform security penetration tests and source code reviews for us. No critical findings were found. Phew!
We resolved a few smaller security findings immediately and made a significant improvement on our AWS security practice using bastion account.
June 2020
Entities: payment type and receipt number are now extracted using the new Taggun's engine.
To prevent Server Side Request Forgery, we have stopped accepting IPV6 as file URLs in all our API endpoints.
We also improved the processing of merchant names with enriched data from FourSquare.
Fixed a bug that causes intermittent language code errors.
Gracefully handled PDF files that cause request errors.
May 2020
Added fast in-memory BM25 text search for more accurate and faster identification of country in the near parameter.
More tinkering on our receipt scanning accuracy benchmarking tool, so we can streamline the process to ingest data from the feedback API.
We have also improved the payment types extraction to support a wider variety of payment processors.
April 2020
We improved the way we handle receipts from France and Hong Kong. Particularly, to improve the accuracy level for the total amount.
We added a new feature to return the merchant postal code when detected on a receipt. This is great if you are using TAGGUN to validate receipt as proof of purchase for digital loyalty campaigns.
We have added the ability to detect enhanced merchant address work for Australian market.
March 2020
We have upgraded our third-party dependencies to avoid any security vulnerabilities. A nice side effect is we now have a brand new Swagger UI for our API tester and documentation.
New feature to detect city or country name on a receipt. This is great if you want to pre-populate the location field of the expense for your app.
For those of you who like to have separate billing for different sub-accounts under one API key, please let us know. We have now added the ability for you to do just that.
We added the ability to extract payment type (Eg: Visa, MasterCard, or Cash) from the receipts.
Fighting spammers. We had to implement some high-level protection against abuse on our demo site. Annoying cat and mouse game. But, fun stuff.
Occasionally, the PDF rendering can cause some issues in our API. We have fixed this with a fork of Google's PDFium library for node. And we have released it as an open-source package at https://www.npmjs.com/package/@taggun/pdfium
February 2020
We rolled out a new model to detect location from the text of the receipts and invoices.
Previously we relied on the "near" parameter of IP address to detect the location... Now, we have improved the algorithm to better detect the city and country name from the receipt.
We use a list of city and country names from https://www.geonames.org/countries/ with a population above 150,000 to train the NER model.
Previously, if the request header content-type is not set as application/json, we would still process the content as JSON type. We're now performing stricter checks on the content-type request header for safety.
For those who love to get the accurate time from the receipts, good news, you can now set "extractTime" to true and we will return the time if we can detect it from the receipt.
Occasionally, a valid PDF file will be deemed as unsafe by our security check... Now we allow some tolerance for PDF files with an unexpected signature in the file bytes.
For those who serve Australian customers, Australia Business Number (ABN) number is mandatory in Australian invoices. We now integrate with the Australian government's API to return enriched data from the ABN registry.
January 2020
Added new feature to detect location from the text of the receipts and invoices.
We received feedback that the 418 error status code is non-standard and causing their application to retry the requests if the file is not a receipt. So, we now return success code 200 with empty results for each property if there is no text detected on the receipt.
If the file type is PDF, and content-type is "application/octet-stream" (a common mistake made by other applications), we will now try to handle this as a PDF file gracefully for you as application/pdf content-type.
If we detect unmatched content-type and file type for security check purposes, we now return the 400 error status code.
December 2019
Added new feature to integrate with Australian Business Number (ABN) and return enriched data from ABN registry.
Improved error handling for files that are not receipts
Improved handling of PDF files with
application/octet-stream
content-type
November 2019
Improved support for multiple languages with new Natural Language Processing and Named Entity Recognition packages and processing.
Improved date accuracy for US date format
October 2019
Added new feature to extract QR Data when QR barcode is detected on the receipt or files.
Added new feature to extract special information for official Chinese Fapiao Invoices.
September 2019
Improved accuracy on
totalAmount
to avoid picking up "total cash paid"Improved accuracy for
invoiceNumber
August 2019
Improved PDF processing to have a wider support for non-standard fonts.
Improved request performance for PDF files processing.
Added strict request parameters validation for MD5.
Added
itemCounts
extraction to detect the number of items in a receipt.
July 2019
Added support for Hungarian receipts.
Improved overall accuracy extraction with better detection of space and symbols in the OCR text.
June 2019
Improved date extraction by using keywords to pick up the correct invoice date. And to avoid dates such as: delivery date, due date, period date and etc.
Improved stability and performance by tuning the memory usage of containers.
May 2019
Improved request response time with caching of
merchantName
. The cache will refresh every 24 hours.Updated all third party dependencies to avoid security vulnerabilities.
Added support for Romanian receipts.
April 2019
Improved invoice number extraction by using filename.
Improved image preprocessing optimisation for OCR text.
March 2019
Added
ignoreMerchantName
parameter to avoid extracting the invoice recipient's name as the merchant name.Improved invoice number extraction.
Added a new server in UK.
February 2019
Improved amount and tax extraction for Nordic receipts.
Improved tax extraction for GST in Australia and New Zealand.
January 2019
Upgraded 3rd party library dependencies.
Improved API performance improvement by reducing memory usage.
December 2018
Upgraded all software dependencies to the latest stable version.
Improved API performance by reducing up to 10% processing time.
November 2018
Deployed a new server in the UK region to handle the increased load in Europe.
Inspect and validate uploaded files with allowed file types.
October 2018
Added
near
parameter to accept a geo location to validate and enrich the merchant details.
September 2018
Added
currencyCode
property tototalAmount
andtaxAmount
which returns three-letter ISO currency code.Removed redundant endpoint
/api/article/v1/verbose/file
.Removed redundant endpoint
/api/locale/v1/location
.
August 2018
Added
merchantCity
merchantState
andmerchantCountryCode
properties to be returned whenmerchantName
ormerchantAddress
is found.
June 2018
Added a new endpoint
/api/account/v1/merchantname/add
to help improve the accuracy of merchant name for your account.
May 2018
Improved accuracy for
merchantName
andmerchantTypes
with improved algorithm and merchant information data provider.Added a feature to extract and detect spaces as thousand separators for receipts in Belgium. This feature is not enabled by default; email to request this feature to be enabled for your account.
April 2018
Improved accuracy for receipts from Brazil.
March 2018
Added feature to extract VAT numbers from Belgium. This feature is not enabled by default; email to request this feature to be enabled for your account.
Improved accuracy for receipts from Belgium
February 2018
Added a new endpoint and data centre https://api-uk.taggun.io in the UK, providing lower latency and faster throughput for customers in the Europe region.
December 2017
When API request encounters an error, the endpoint now returns HTTP status code 400 with a short description of the error.
Improve accuracy for date formatting detection and extraction.
September 2017
Released an alpha version of the invoice number and IBAN (bank account number) extraction under the property of
entities
.Improved line item description and amount detection and accuracy.
August 2017
Increased accuracy for extracting
totalAmount
using subset sum algorithm.Released an alpha version of line item description and amount under the property of
lineAmounts
. This will detect the description and the amounts found on the receipt.
July 2017
Enhanced
merchantName
,merchantAddress
andmerchantType
to feed-forward production data as training data for Machine Learning to improve accuracy.Increased accuracy by rotating the image 90 degrees if receipt is detected to be landscape.
Added support to extract date with without year information, e.g. JULY'24. The engine will assume the year is the current year.
June 2017
Converted to use PDFium as the PDF reader, reducing 70% on PDF conversion time.
Reduce time to load file for
/api/receipt/v1/simple/url
by using in-memory buffer.
May 2017
Added
/api/receipt/v1/match/file
endpoint to recognise if an image of a receipt contains certain keywords (supplied by the client) and return result indicating whether the image is a likely or unlikely match to the supplied keywords.Added optional request parameters to support languages other than English for OCR. Currently supporting:
en, es, fr, jp, he, iw
Added a new property
numbers
in verbose mode to return any detected numbers on a receipt or article. This is useful for those who wants to detect any reference number or codes on a receipt.
April 2017
Added line amounts in the
verbose
results. Line amounts are 1 or more amounts that were found on the receipt that accurately sums up to detected total amount of the receipt. This is implemented using the subset sum algorithm. It also greatly improves theconfidenceLevel
calculation.Publish this help documentation for Taggun. Even though Swagger documentation is great for up-to-date information for the APIs, there are still other aspects in Taggun that I would like to share with everyone.
Added
/api/article/v1/verbose/file
endpoint to perform OCR other than receipt. This is useful to just do a simple image-to-text dump and capture any metadata like dates and amounts that are found on the image.
March 2017
Modify receipt endpoints and the structure of the response uniform and easier to understand.
Added a dump of all detected amounts on a receipt to meet a feature request from a Reddit user. Follow the link to see the Reddit post.
Started Taggun blog. Apparently blog is the most effective tool for digital marketing?!!. Check it out --> https://blog.taggun.io
Woohoo... Taggun has a bot!! His name is Marvin and he is Taggun's admin assistant. You can ask him to do a few basic administrative tasks now. Get to know more about Marvin.
There are many wonderful things that have happened before March 2017, but they are not logged here.
Last updated