Last updated
Last updated
Exciting news - Our new feature for customer engagement companies that are using our services to validate receipts as proof of purchase.
We are partnering with two incredible clients to launch the receipt validation feature with extensive collaboration, testing and feedback.
Our cutting-edge technology is now capable of extracting Japanese receipts - thanks to the power of OpenAI!
More breakthroughs in accuracy when it comes to merchant names!
Improved accuracy of the Total Amount field as we started seeing more diverse receipt formats coming through.
We've re-adjusted the confidence level threshold for OCR text to boost the precision and robustness of the model.
After some delay due to many iterations of experimenting and testing, the line items feature has recently seen a major breakthrough. We're excited to launch the feature publically in the coming weeks.
As we and our clients take on more global clients, we've made improvements to the currency code and location extraction.
We’ve made some amazing improvements with merchant name and location accuracy by taking advantage of a new Named Entity Recognition (NER) model.
We also improve the way we process and extract information from invoices. This is to allow the engine to tackle some of the more complex invoice formats.
Recently, the OCR is noticeably robust in recognising handwriting. This is great news for handwritten receipts, especially for restaurant receipts in the US, where the tips and total amount are commonly written by hand.
We’ve released a classification engine to categorise transactions. This is special because the labels are customisable and are based on each client.
We have made progress to improve line items - especially the accuracy. It's particularly to reduce any duplicates and improve the detectability of the line items for longer receipts. E.g., The grocery receipts in the USA.
Location data on the receipt is important for our customers. We have improved the way we detect and verify the location accurately.
Detection of correct date format has also been improved specifically for receipts from the USA, where the date format is primarily month first.
Internally, we built a context classifier to differentiate receipts, invoices, and mobile screenshots. This is to fine-tune the results to achieve a better accuracy rate. We have improved the context classifier accuracy with new training data sets.
Product Line Items performance has been improved - the response time has been reduced by 10-15%.
The Australian Business Number verification process is more robust - as Taggun is now hosting the database in-house and no longer relies on the 3rd Party API.
As requested by a customer, Business Type Classification development is underway and is targeted to be released in June. This will help Taggun customers in Expense Management Industry to automatically classify their customer's spend.
Last cycle was a very short one and the focus for this one is Line Items. We will have a lot more to report on that soon!
Get in touch if you want us to enable line items for you - and please give us feedback.
Made further accuracy and performance improvements for line items extraction for beta customers.
Improved the way we perform OCR on PDF files, especially for the non-standard PDF format files.
Investigated the threat of Log4j security vulnerabilities in our system to ensure it doesn't impact our customers.
For our cloud customers, we upgraded to use AWS Global Accelerator. We measured over 30% faster time-to-first-byte in some regions. On top of that, cross-region failover is now instantaneous, (without having to wait for DNS to propagate) to minimise API downtime.
Improved accuracy for the total amount, especially for receipts in the Europe region.
Improved payment type to detect new payments types in the Asia Pacific region, and better detection for Visa, MasterCard and American Express.
Performance just got better. We reduce the average API response time by improving our parallel processing to squeeze more out of the few seconds we have for the receipts.
Available for Beta Testing ✌️ - Line item details (name, quantity, unit price, and total price) of products detected on a receipt. Please respond to this email if you would like to try it out now.
Migrated our cloud customers to use Microsoft for the underlying image-to-text OCR, offering better accuracy in all languages, including better support for handwriting.
Our API monitoring just got better. We set up better APM monitoring tools for our API so we can better respond to any delay or spike of traffic for our system.
We improved the merchant name model to avoid picking up some generic stop words.
Fixed a security vulnerability. We now only accept HTTPS scheme when domains are passed to the URL parameter. This is to avoid DNS rebinding attacks.
We did the groundwork to seamlessly migrate the underlying image-to-text OCR to use Microsoft OCR service.
We also improved the detectability Geonames location for city names with accent.
We fixed a few regression issues with Invoice number extraction.
For our Australian customers, we fine tuned the ABN number extraction algorithm to improve the accuracy rate.
Cities with shorter names (Eg: Lyon) were sometimes missed, we fixed a weakness in our GeoNames city detection to improve the detectability of these cities.
We improved the accuracy for mobile screenshots of receipts from ride-sharing apps in Asia Pacific.
We've tinkered with the receipt number detection to improve the accuracy for receipts in Hong Kong.
For UK receipts, we have implemented HMRC services for merchantVerification, since UK VAT validation service no longer works directly with VIES (EU initiative).
We doubled the size of our cloud hosted servers to improve faster response times and stability under load.
Continued to make further improvement on merchant name accuracy, which we are happy to have received good feedback from our customers.
Improved the handling of receipts from Singapore, especially on the currency code and amounts.
For receipts from the UK, we have implemented HMRC services for merchantVerification, since UK VAT validation service no longer works directly with VIES (EU initiative).
We doubled the size of our cloud hosted servers to improve faster response times and stability under load.
We continued to make further improvement on merchant name accuracy, which we are happy to have received good feedback from our customers.
We improved the handling of receipts from Singapore, especially on the currency code and amounts.
We've introduced a multi-tax feature. We can now detect and extract multiple tax rates and tax amounts (if available) on a receipt. Please reach out if you wish to enable this feature for your account.
We fixed a peculiar and intermittent bug with PDF file processing that causes 504 error response code to our customers.
We resolved reported degradation with the ignoreMerchantName parameter. We found and fixed a weakness in our model when handling merchant names with lower confidence level.
We improved our Swagger API documentation, so that it is easier for our new users to learn and use the endpoints.
We've introduced a multi-tax feature. We can now detect and extract multiple tax rates and tax amounts (if available) on a receipt. Please reach out if you wish to enable this feature for your account.
We made major improvements on merchant name extraction to allow better detection of shop names on receipts, especially for French receipts.
Fixed a minor weakness in our detection for date format.
We've also improved the quality of the OCR text content.
For our European customers, we added a new feature to detect VAT Tax ID and retrieve the up-to-date VIES merchant information from the official EU website. When you make a requests to the verbose endpoints, the new property entitiy.merchantVerification will be available if we detect a match on the receipt or invoice.
We can also now re-align the text on curled up receipts to improve our analysis of the text.
We fixed a bug where some street addresses are being returned for French receipts.
We've improved the tax amount extraction for Australian and New Zealand receipts.
We've further improved the accuracy of date and time extraction to avoid picking up the wrong dates on utility bills.
We continue to monitor the accuracy rate and improve the accuracy of merchant names for each of our customers.
By the way, we renewed our servers SSL security certificates to keep things safe and sound.
Date and time are now extracted using Taggun's new engine. We have also enhanced features like enriching the date result with EXIF metadata in JPEG files.
We implemented an automatic feedback and training pipeline to improve Merchant Name accuracy using location data from Google Places.
We implemented an OCR augmentation in the new engine to specially handle common OCR mistakes; for example 2S Sept 2020 to 25 Sept 2020
In addition to Receipt and Invoice, we added a new context called Fapiao to better handle special Chinese official tax invoices in the new engine.
Merchant name is now extracted with Taggun's new engine.
Using the new engine, we will now automatically detect your document format (eg: receipt, invoice) to process them separately to achieve better results.
For companies that rely on the time component of dates, we have improved accuracy for date/time extraction especially if the time component is on a separate block in the receipt.
For receipts coming from Sweden, we have improved the extraction of Swedish company names as a merchant name entity.
We addressed a bug where future receipt dates were selected in certain scenarios.
We fixed an annoying and intermittent issue that occurred in our continuous deployment process.
We've improved the calculation of overall confidence level to decay the scoring if the receipt image is skewed or has poor OCR text quality. Our aim is to recommend the receipt to be marked for review if confidence level fall below 0.8.
To improve the accuracy of our results, we now have a more robust way to rotate the image before sending the image for OCR if the image is rotated at 90°, 180°, 270° degrees.
At TAGGUN, safety and security is paramount. We used a top security consulting company in NZ to perform security penetration tests and source code reviews for us. No critical findings were found. Phew!
We resolved a few smaller security findings immediately and made a significant improvement on our AWS security practice using bastion account.
Entities: payment type and receipt number are now extracted using the new Taggun's engine.
To prevent Server Side Request Forgery, we have stopped accepting IPV6 as file URLs in all our API endpoints.
We also improved the processing of merchant names with enriched data from FourSquare.
Fixed a bug that causes intermittent language code errors.
Gracefully handled PDF files that cause request errors.
Added fast in-memory BM25 text search for more accurate and faster identification of country in the near parameter.
More tinkering on our receipt scanning accuracy benchmarking tool, so we can streamline the process to ingest data from the feedback API.
We have also improved the payment types extraction to support a wider variety of payment processors.
We improved the way we handle receipts from France and Hong Kong. Particularly, to improve the accuracy level for the total amount.
We added a new feature to return the merchant postal code when detected on a receipt. This is great if you are using TAGGUN to validate receipt as proof of purchase for digital loyalty campaigns.
We have added the ability to detect enhanced merchant address work for Australian market.
We have upgraded our third-party dependencies to avoid any security vulnerabilities. A nice side effect is we now have a brand new Swagger UI for our API tester and documentation.
New feature to detect city or country name on a receipt. This is great if you want to pre-populate the location field of the expense for your app.
For those of you who like to have separate billing for different sub-accounts under one API key, please let us know. We have now added the ability for you to do just that.
We added the ability to extract payment type (Eg: Visa, MasterCard, or Cash) from the receipts.
Fighting spammers. We had to implement some high-level protection against abuse on our demo site. Annoying cat and mouse game. But, fun stuff.
We rolled out a new model to detect location from the text of the receipts and invoices.
Previously we relied on the "near" parameter of IP address to detect the location... Now, we have improved the algorithm to better detect the city and country name from the receipt.
Previously, if the request header content-type is not set as application/json, we would still process the content as JSON type. We're now performing stricter checks on the content-type request header for safety.
For those who love to get the accurate time from the receipts, good news, you can now set "extractTime" to true and we will return the time if we can detect it from the receipt.
Occasionally, a valid PDF file will be deemed as unsafe by our security check... Now we allow some tolerance for PDF files with an unexpected signature in the file bytes.
For those who serve Australian customers, Australia Business Number (ABN) number is mandatory in Australian invoices. We now integrate with the Australian government's API to return enriched data from the ABN registry.
Added new feature to detect location from the text of the receipts and invoices.
We received feedback that the 418 error status code is non-standard and causing their application to retry the requests if the file is not a receipt. So, we now return success code 200 with empty results for each property if there is no text detected on the receipt.
If the file type is PDF, and content-type is "application/octet-stream" (a common mistake made by other applications), we will now try to handle this as a PDF file gracefully for you as application/pdf content-type.
If we detect unmatched content-type and file type for security check purposes, we now return the 400 error status code.
Added new feature to integrate with Australian Business Number (ABN) and return enriched data from ABN registry.
Improved error handling for files that are not receipts
Improved handling of PDF files with application/octet-stream
content-type
Improved support for multiple languages with new Natural Language Processing and Named Entity Recognition packages and processing.
Improved date accuracy for US date format
Added new feature to extract QR Data when QR barcode is detected on the receipt or files.
Added new feature to extract special information for official Chinese Fapiao Invoices.
Improved accuracy on totalAmount
to avoid picking up "total cash paid"
Improved accuracy for invoiceNumber
Improved PDF processing to have a wider support for non-standard fonts.
Improved request performance for PDF files processing.
Added strict request parameters validation for MD5.
Added itemCounts
extraction to detect the number of items in a receipt.
Added support for Hungarian receipts.
Improved overall accuracy extraction with better detection of space and symbols in the OCR text.
Improved date extraction by using keywords to pick up the correct invoice date. And to avoid dates such as: delivery date, due date, period date and etc.
Improved stability and performance by tuning the memory usage of containers.
Improved request response time with caching of merchantName
. The cache will refresh every 24 hours.
Updated all third party dependencies to avoid security vulnerabilities.
Added support for Romanian receipts.
Improved invoice number extraction by using filename.
Improved image preprocessing optimisation for OCR text.
Added ignoreMerchantName
parameter to avoid extracting the invoice recipient's name as the merchant name.
Improved invoice number extraction.
Added a new server in UK.
Improved amount and tax extraction for Nordic receipts.
Improved tax extraction for GST in Australia and New Zealand.
Upgraded 3rd party library dependencies.
Improved API performance improvement by reducing memory usage.
Upgraded all software dependencies to the latest stable version.
Improved API performance by reducing up to 10% processing time.
Deployed a new server in the UK region to handle the increased load in Europe.
Inspect and validate uploaded files with allowed file types.
Added near
parameter to accept a geo location to validate and enrich the merchant details.
Added currencyCode
property to totalAmount
and taxAmount
which returns three-letter ISO currency code.
Removed redundant endpoint /api/article/v1/verbose/file
.
Removed redundant endpoint /api/locale/v1/location
.
Added merchantCity
merchantState
and merchantCountryCode
properties to be returned when merchantName
or merchantAddress
is found.
Added a new endpoint /api/account/v1/merchantname/add
to help improve the accuracy of merchant name for your account.
Improved accuracy for merchantName
and merchantTypes
with improved algorithm and merchant information data provider.
Added a feature to extract and detect spaces as thousand separators for receipts in Belgium. This feature is not enabled by default; email to request this feature to be enabled for your account.
Improved accuracy for receipts from Brazil.
Added feature to extract VAT numbers from Belgium. This feature is not enabled by default; email to request this feature to be enabled for your account.
Improved accuracy for receipts from Belgium
When API request encounters an error, the endpoint now returns HTTP status code 400 with a short description of the error.
Improve accuracy for date formatting detection and extraction.
Released an alpha version of the invoice number and IBAN (bank account number) extraction under the property of entities
.
Improved line item description and amount detection and accuracy.
Released an alpha version of line item description and amount under the property of lineAmounts
. This will detect the description and the amounts found on the receipt.
Enhanced merchantName
, merchantAddress
and merchantType
to feed-forward production data as training data for Machine Learning to improve accuracy.
Increased accuracy by rotating the image 90 degrees if receipt is detected to be landscape.
Added support to extract date with without year information, e.g. JULY'24. The engine will assume the year is the current year.
Reduce time to load file for /api/receipt/v1/simple/url
by using in-memory buffer.
Added /api/receipt/v1/match/file
endpoint to recognise if an image of a receipt contains certain keywords (supplied by the client) and return result indicating whether the image is a likely or unlikely match to the supplied keywords.
Added optional request parameters to support languages other than English for OCR. Currently supporting: en, es, fr, jp, he, iw
Added a new property numbers
in verbose mode to return any detected numbers on a receipt or article. This is useful for those who wants to detect any reference number or codes on a receipt.
Added line amounts in the verbose
results. Line amounts are 1 or more amounts that were found on the receipt that accurately sums up to detected total amount of the receipt. This is implemented using the subset sum algorithm. It also greatly improves the confidenceLevel
calculation.
Publish this help documentation for Taggun. Even though Swagger documentation is great for up-to-date information for the APIs, there are still other aspects in Taggun that I would like to share with everyone.
Added /api/article/v1/verbose/file
endpoint to perform OCR other than receipt. This is useful to just do a simple image-to-text dump and capture any metadata like dates and amounts that are found on the image.
Modify receipt endpoints and the structure of the response uniform and easier to understand.
Some of our customers use ID to identify and map the location accurately. We have also improved this feature so that it chooses the right location.
Moved a cloud-hosted server from the UK to the France region to serve our customers in the EU since Brexit. The endpoint will be removed.
Occasionally, the PDF rendering can cause some issues in our API. We have fixed this with a fork of Google's PDFium library for node. And we have released it as an open-source package at
We use a list of city and country names from with a population above 150,000 to train the NER model.
Added a new endpoint and data centre in the UK, providing lower latency and faster throughput for customers in the Europe region.
Increased accuracy for extracting totalAmount
using .
Converted to use as the PDF reader, reducing 70% on PDF conversion time.
Added a dump of all detected amounts on a receipt to meet a feature request from a Reddit user. Follow the to see the Reddit post.
Started Taggun blog. Apparently blog is the most effective tool for digital marketing?!!. Check it out -->
Woohoo... Taggun has a bot!! His name is Marvin and he is Taggun's admin assistant. You can ask him to do a few basic administrative tasks now. Get to know more about .