Preventing System Failures

Published: 2019-12-11 08:00:00
1792 words
7 pages
15 min to read
Type of paper: 
This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

There has been an increasing shift from manual-based systems to software systems. Software systems are increasingly becoming the backbone of every organization in the world. As a result, it is increasingly becoming inevitable to use software both as an individual and as an organization. Software systems bring added benefits. However, the benefits are not always easy to identify especially if the current partly software- or manual-based system is not running the business. The software may be introduced for easier control, efficiency, cutting operational costs, handling huge transactions, and so forth. But if the software is not designed well and properly thought out, it may not bring any advantages at all. The software can bring in two benefits: tactical and strategic benefits. Strategic benefits include those that improve the day-to-day organizational transactions while strategic benefits involve those benefits which improve the nature or ability of an organization. With their use in high-risk and mission-critical applications which not only impact on human lives and critical transactions, it is very important that software are defect-free and can remain stable for the entire time they are in use. However, despite testing for anticipated failures and ensuring there are zero defects, failures still occur. Table 1 is a summary of some of the most notable software failures which have occurred in the period 2012-2016 which resulted in not only disruptions but also the loss of resources. These failures could have been avoided.

Year Month Organization Effect

2012 August Knight Capital $440 million loss in cash

2012 November United Airlines Thousands of passengers delayed

2012 November EnergyAustralia Wrong late payment charges

2013 June Facebook Private data of six million users exposed

2013 August US Nasdaq Online trade shutdown

2013 October US Government 30% insurance applications problems

2014 November Microsoft Users could not access apps and data

2014 December Brunswick Electric Power failure

2014 December Amazon Drastic price reductions to 1p

2015 March US Airforce Incorrect detection of targets

2015 August HSBC 275,000 payments delayed

2016 January NASA Global GPS timing anomaly

2016 February British Gas People's homes temp soar to 32C

2016 September British Airways Thousands of passengers delayed

Table 1 Showing a summary of the major software failures

Amazon System Software Failure

Amazon software glitch occurred on the eve of Christmas, and it took almost an hour. The problem began when a software used by third-party retailers to ensure that their products are one of the cheapest on the market malfunctioned and reduced prices of thousands of various items to as low as 1p. Items such as games, mattresses, batteries, clothing, headphones, and so forth which used to cost a lot suddenly began selling at 1p. The problem left many small family-run businesses with heavy losses with some companies facing closure. One trader, a dealer in games and toys, pointed out that it could have cost him 100,000. Due to the sudden drop in price, shoppers were buying more than one piece of each item with some buying up to 100 pieces of items which usually retail at 100,000 (Neate). Although Amazon tried to cancel orders that had not been dispatched, sellers complained that such cancellation had an effect on their customer rating. Still, some traders called requesting for cancellation, but some cancellations were unsuccessful because once the systems set everything moving it was difficult to stop because it becomes hard to involve human intervention (Neate). The software system failure not only resulted in reduced confidence in the automated pricing system but also huge losses to sellers. The software problem would have been avoided if the software program had been well-designed and also well-thought out with anticipation of such a kind of failure. The system designers would have designed it with a security feature which makes it hard for prices to fall below the cost price or standard minimum.

HSBC System Software Failure

HSBC system failure occurred when there was a glitch in the payments software glitch. The glitch prevented approximately 275,000 people in the UK from receiving their pay cheques before the banks holiday weekend. The failure was partly attributed to the aging computer systems used by some banks. The banks management revealed that the failure was caused by a flaw in the information contained in a file which was presented to Bacs (Morris). Bacs is a system used by British banks to process millions of direct credit and direct debits. As a result, the fault prevented some of the HSBC customers from completing their payments. Consequently, companies trying to either settle invoices or pay wages to staff were affected. The impact was a great inconvenience to the bank direct and indirect customers. The failure also resulted in the banks move to compensate inconvenienced customers. The system failure was traced to the Bacs system which has been constructed over several decades through product acquisitions as well as product launches to form a complex and costly patchwork of systems. The aging systems are strained by increasing number of customers who use their phone apps to do online transactions. This shows the companys IT system needs an upgrade to prevent similar problems from occurring in future. The bank should, therefore, invest in new robust software to prevent similar problems in future.

Annotated Bibliography

Dalal, Sandeep, and Rajender Singh Chhillar. "Case studies of most common and severe types of software system failure." International Journal of Advanced Research in Computer Science and Software Engineering 2.8 (2012): 341-347.

In this article, the authors analyzed several case studies involving common and severe software system failure. In the cases analyzed, the authors concluded that software system failure could result in financial losses and loss of lives. The authors also found out that software failures can lead to wastage of effort and time which triggers other intangible losses such as stress, reputation, good will, discomfort, peace, and so forth.

To get an in-depth knowledge of various aspects of software systems, the authors reviewed past literature regarding systems failure. This include the definition of software failures, statistics pertaining prevalence of software failures in the industry, and factors which determine the severity of system failures. Some of the common sources of failure include some users of the software, the involvement of financial transactions, impact on lives of people upon failure, and nature of use (home, space, defense, aviation, missile, satellite, and so forth). Through the case studies, the authors were able to come up with common causes of software failure.

The authors used credible sources to conduct their studies. The authors also studied major software failures in the past. The constraint of this study is that the number of case studies used is few. Therefore, the findings on the causes of software failures and factors which determine the severity of software failure cannot be generalized to all common and severe failures of software. The article provides a good overview of the definition of software failures, factors which cause software failure, factors which determine the severity of failure, impacts of software failure, practical cases, and practical examples of past cases of failures.

Pitakrat, Teerat, Andre van Hoorn, and Lars Grunske. "Increasing dependability of component-based software systems by online failure prediction (short paper)." Dependable Computing Conference (EDCC), 2014 Tenth European. IEEE, 2014.

Teerat, Van Hoorn and Grunske sought to offer a solution to software failures by developing an approach which can be used to predict failure through online monitoring. Their solution not only tells when the failure will occur but also where it occurs. The authors proposed solution improve the dependability of the system by predicting possible system problems and issuing a warning before failure occur.

To demonstrate how their proposal works, the authors divided their work into many parts. Firstly, they described their envisioned their approach. Their approach came as a result of appreciating the challenges encountered when predicting large system failures. Then they described the architectural model of their proposed solution and the nature of prediction models employed. This is followed by the construction of the prediction models of each architectural components. The authors further provided related work to online prediction models.

Software failures have been challenging to organizations. A solution to software failures is a relief for many organizations. This article is, therefore, relevant to this topic. The authors elucidated the limitation of the current approach and provided a plan of their future improvement of the model. Sources used in the article are relevant and credible. One of the limitations of this work is that the approach is not based on real-world data, lab experiments, or simulation.

Hamill, Maggie, and Katerina Goseva-Popstojanova. "Exploring fault types, detection activities, and failure severity in an evolving safety-critical software system." Software Quality Journal 23.2 (2015): 229-265.

The aim of the paper is to bridge a gap existing between establishments of the links from software faults to observed or potential failures. Consequently, the authors studied types of faults responsible for software failures, activities which took place upon software failure report, and also the severity of the failures. Further, the authors established an association among the various attributes and the trends in their releases as well as across releases.

To carry out their study, the authors used data which they extracted from the mission critical NASA mission. Specifically, the authors analyzed 21 large-scale components of software comprise 8,000 files and lines of codes running into millions. The analysis used in the paper is robust, and therefore the results are dependable. Besides using a case study, the authors also studied related work done by others. The authors collected quality data and the authors were able to establish relations between detection activities, fault types, and severity of the failure. As a result, they were in a situation to associate faults in the system to failures. The authors further went ahead to identify the nature of the detection activity when failures surfaced. The authors used not only credible sources but also many sources in this work.One limitation of this study is that the authors only used one case study and this limits the generalization of the results.

Fronza, Ilenia, Sillitti Alberto, Succi Giancarlo, Terho Mikko, and Vlasenko Jelena. "Failure prediction based on log files using Random Indexing and Support Vector Machines." Journal of Systems and Software 86.1 (2013): 2-11.

In this article, the authors proposed a failure prediction model which is based on log files. Failure prediction model uses Random Indexing and Support Vector Machines. The authors came up with this system to address impacts of software failures which can require an unexpected amount of resources and time to recover. The authors appreciated the need for accurate predictions which can help in the prevention of software failures.

To address the gap between the existing prediction models, the authors conducted a literature review in related work. The authors used substantial and quality sources related to the topic. The authors then gave an in-depth description of the major terminologies: log files, random indexing, and support vector machines. This was followed by a description of their proposed approach. The authors then performed an experim...


Request Removal

If you are the original author of this essay and no longer wish to have it published on the SpeedyPaper website, please click below to request its removal: