This particular Python error message indicates that you’ve been strung out to dry – literally! You fed a Python function that expects to convert a string into an integer variable something other than an integer object. This incorrect argument causes python to halt and catch fire (in good Pythonic fashion)…
The two usual suspects are either:
- Decimal points
- Character values in the middle of a number
- Rarely, an inappropriate argument type due to errors in your code (fairly easy to spot)
This type of value error is especially common if you’re processing number data which has been imported from another source, such as a flat file or a SQL database. Be wary of financial systems; many use data type(s) with a default value that includes a decimal number system, since their main goal is tracking money (a decimal value).
Description of the error
Let’s translate this error message into English. A “python invalid literal” is a fancy way to saying the the string representation you gave it doesn’t match what it was expecting. You’ve managed to feed something other than a digit to a string argument that expected digits.
Digging deeper, the real root cause is generally driven by how you’re pulling your data. While you can often bypass this using something like the python try / catch or populating fields with a default value, you may want to dig deeper. This is generally due to either:
- Errors in the file layout that you’re using for the input file
- Errors in the specifications for your data type, generally because someone told you a decimal (float value) was an integer value.
- The data that was passed to you doesn’t actually match the specifications anyway, since someone in the upstream process found a way to jam non-conforming values into the database (surprisingly common)
Either way, this is something you want to figure out from an overall data quality perspective, since the same error may affect other fields. I’ve found a lot of deep process data quality issues from silly stuff like a stray “float string” value.
How to correct the error
There are multiple ways to address the error.
The best approach, recommended if you’re going to rely on that variable for your core analysis, is to treat the error as a warning and do a full audit of the data values you’re loading into the project. Then fix the error at the source. Check your file layout or database mapping, look for any non integer value in the feed, and track it back to the source. Hunt down the bad string value or period character (chars) and ask why it is there. The mere existence of this error indicates that your upstream data doesn’t match the specs.
Assuming you don’t fix it at the source, you have many options to filter and bypass this problem. There are many ways to work around this error with good exception handling and preventative data hygiene efforts during data conversion.
You can use the Python isdigit() function to verify that the string argument input is a number.
Along the same lines, the int() function will return an error message if the input is not a number. Put this in a try block.
If you want to “clean” the string value by force converting a floating point number into an integer value, you can fix this with two function calls. First, pass the string to the float() function to turn it into a number. Then pass the float value result to the int function to convert it into an integer value (int type). These two methods will remove the error message. This decimal numbers error can be fixed by combining the functions into int (float(“string”)).
If you see a space character in your data, you must review your input file specifications. That generally means you’re not reading the file right or are pulling from a seriously corrupted database field. That kind of error MUST be fixed at the source or by using a default base value for the affected records. Ironically this kind of value can have predictive value in many models (any data record that is sufficiently messed up often is an indicator of events going seriously off track in the real world.)
Going All the Way Upstream
This error would be easy if you had control over the files used by users to input value data.
You cannot control the actions of users so you must verify that they are correct before you put them in a function. Also, arrange your code to allow for different data arrangements.